Conference Paper

Performance Issues in Evaluating Models and Designing Simulation Algorithms

Authors:
  • Limbus Medical Technologies GmbH
  • Limbus Medical Technologies GmbH
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The increase and diversity of simulation methods bears witness of the need for more efficient discrete event simulations in computational biology-but how efficient are those methods, and how to ensure an efficient simulation for a concrete model? As the performance of simulation methods depends on the model, the simulator, and the infrastructure, general answers to those questions are likely to remain illusive; they have to besought individually and experimentally instead. This requires configurable implementations of many algorithms, means to define and conduct meaningful experiments on them, and mechanisms for storing and analyzing observed performance data.In this paper, we first overview basic approaches for improving simulation performance and illustrate the challenges when comparing different methods. We then argue that providing all the aforementioned components in a single tool, in our case the open source modeling and simulation framework JAMES II,reveals many synergies in effectively pursuing both questions.This is exemplified by presenting results of recent studies and introducing a new component to swiftly evaluate simulator code changes against previous experimental data.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Usually algorithms under test are added as a plug-in to the system. Thus, during testing the algorithms, all alternative algorithms can be plugged in and evaluated in the same environment, e.g., (Himmelspach & Uhrmacher 2007a;Ewald et al. 2009). Thereby, trust in the correct functioning of newly developed plug-ins and insights into their performance can be gained. ...
... Usually algorithms under test are added as a plug-in to the system. Thus, during testing the algorithms, all alternative algorithms can be plugged in and evaluated in the same environment, e.g., (Himmelspach & Uhrmacher 2007a;Ewald et al. 2009). Thereby, trust in the correct functioning of newly developed plug-ins and insights into their performance can be gained. ...
Article
Full-text available
Often new modeling and simulation software is developed from scratch with no or only little reuse. The benefits that can be gained from developing a modeling and simulation environment by using (and thus reusing components of) a general modeling and simulation framework refer to reliability and efficiency of the developed software, which eventually contributes to the quality of simulation experiments. Developing the tool Mic-Core which supports continuous-time micro modeling and simulation in demography based on the plug-in-based modeling and simulation framework JAMES II will illuminate some of these benefits of reuse. Thereby, we will focus on the development process itself and on the quality of simulation studies, e.g., by analyzing the impact of random number generators on the reliability of results and of event queues on efficiency. The "lessons learned" summary presents a couple of insights gained by using a general purpose framework for M&S as a base to create a specialized M&S software.
Conference Paper
The development of software for modeling and simulation is still a common step in the course of projects. Thereby any software development is error prone and expensive and it is very likely that the software produced contains flaws. This tutorial will show which techniques are needed in modeling and simulation software independent from application domains and model description means and how reuse and the use of state of the art tools can improve the software production process. The tutorial is based on our experiences made on developing and using JAMES II, a flexible framework created for building specialized M&S software products, for research on modeling and simulation, and for applying modeling and simulation.
Book
To select the most suitable simulation algorithm for a given task is often difficult. This is due to intricate interactions between model features, implementation details, and runtime environment, which may strongly affect the overall performance. An automated selection of simulation algorithms supports users in setting up simulation experiments, without demanding expert knowledge on simulation. The first part of the thesis surveys existing approaches to solve the algorithm selection problem and discusses techniques to analyze simulation algorithm performance. A unified categorization of existing algorithm selection techniques is worked out, as these stem from various research domains (e.g., finance, artificial intelligence). The second part introduces a software framework for automatic simulation algorithm selection and describes its constituents, as well as their integration into the modeling and simulation framework JAMES II. The implemented selection mechanisms are able to cope with three situations: a) no prior knowledge is available, b) the impact of problem features on performance is unknown, and c) a relationship between problem features and algorithm performance can be established empirically. An experimental evaluation of the developed methods concludes the thesis. It is shown that an automated algorithm selection may significantly increase the overall performance of a simulation system. Some of the presented mechanisms also support the research on simulation methods, as they facilitate their development and evaluation.
Article
Full-text available
Dry-lab experimentation is being increasingly used to complement wet-lab experimentation. However, conducting dry-lab experiments is a challenging endeavor that requires the combination of diverse techniques. JAMES II, a plug-in-based open source modeling and simulation framework, facilitates the exploitation and configuration of these techniques. The different aspects that form an experiment are made explicit to facilitate repeatability and reuse. Each of those influences the performance and the quality of the simulation experiment. Common experimentation pitfalls and current challenges are discussed along the way.
Article
Full-text available
The algorithm selection problem aims at selecting the best algorithm for a given computational problem instance according to some characteristics of the instance. In this dissertation, we first introduce some results from theoretical investigation of the algorithm selection problem. We show, by Rice's theorem, the nonexistence of an automatic algorithm selection program based only on the description of the input instance and the competing algorithms. We also describe an abstract theoretical...
Article
Full-text available
In this paper, an abstract machine is presented for a variant of the stochastic pi-calculus, in order to correctly model the stochastic simulation of biological processes. The machine is first proved sound and complete with respect to the calculus, and then used as the basis for implementing a stochastic simulator. The correctness of the stochastic machine helps ensure that the simulator is correctly implemented, giving greater confidence in the simulation results. A graphical representation for the pi-calculus is also introduced.
Article
Full-text available
Computational modeling and simulation have become invaluable tools for the biological sciences. Both aid in the formulation of new hypothesis and supplement traditional experimental research. Many different types of models using various mathematical formalisms can be created to represent any given biological system. Here we review a class of modeling techniques based on particle-based stochastic approaches. In these models, every reacting molecule is represented individually. Reactions between molecules occur in a probabilistic manner. Modeling problems caused by spatial heterogeneity and combinatorial complexity, features common to biochemical and cellular systems, are best addressed using Monte-Carlo single-particle methods. Several software tools implementing single-particle based modeling techniques are introduced and their various advantages and pitfalls discussed.
Article
Full-text available
The application of parallel and distributed simulation techniques is often limited by the amount of parallelism available in the model. This holds true for large-scale cell-biological simulations, a field that has emerged as data and knowledge concerning these systems increases and??biologists call for tools to guide wet-lab experimentation. A promising approach to exploit parallelism in this domain is the integration of spatial aspects, which are often crucial to a??model's validity. We describe an optimistic, parallel and distributed variant of??the Next-Subvolume Method (NSM), a method that augments the well-known Gillespie Stochastic Simulation Algorithm (SSA) with spatial features. We discuss requirements imposed by this application on a parallel discrete event simulation engine to achieve efficient execution. First results of combining NSM and the grid-inspired simulation system Aurora are shown.
Article
Full-text available
The paper surveys the literature on the bandit problem, focusing on its recent development in the presence of switching costs. Switching costs between arms makes not only the Gittins index policy suboptimal, but also renders the search for the optimal policy computationally infeasible. This survey will first discuss the decomposability properties of the arms that make the Gittins index policy optimal, and show how these properties break down upon the introduction of costs on switching arms. Having established the failure of the simple index policy, the survey focus on the recent efforts to overcome the difficulty of finding the optimal policy in the bandit problem with switching costs: characterization of the optimal policy, exact derivation of the optimal policy in the restricted environments, and lastly approximation of optimal policy. The advantages and disadvantages of the above approaches are discussed.
Article
Full-text available
In numerically simulating the time evolution of a well-stirred chemically reacting system, the recently introduced "tau-leaping" procedure attempts to accelerate the exact stochastic simulation algorithm by using a special Poisson approximation to leap over sequences of noncritical reaction events. Presented here is an improved procedure for determining the maximum leap size for a specified degree of accuracy. (C) 2003 American Institute of Physics.
Conference Paper
Full-text available
Compartments play an important role in molecular and cell biology modeling, which motivated the development of BETA-BINDERS, a formalism which is an extension of the pi-CALCULUS. To execute BETA-BINDERS models, sophisticated simulators are required to ensure a sound and efficient execution. Parallel and distributed simulation represents one means to achieve the later. However, stochastically scheduled events hamper the definition of look aheads for a conservative parallel synchronization scheme, while an optimistic parallel simulation implies expensive rollback operations due to the dynamic structures of BETABINDERS models. Therefore, a time-bounded window approach is suggested, which allows the different logical processes to proceed optimistically up to a barrier. Rollbacks are thus temporally constrained. In addition, the dynamic structure of BETA-BINDERS models requires a special state handling. BETA-BINDERS models and states are represented as tree structures to facilitate state updates and rollbacks by the simulation engine.
Conference Paper
Full-text available
Modeling and simulation frameworks for use in different application domains, throughout the complete development process, and in different hardware environments need to be highly scalable. For achieving an efficient execution, different simulation algorithms and data structures must be provided to compute a concrete model on a concrete platform efficiently. The support of parallel simulation techniques becomes increasingly important in this context, which is due to the growing availability of multi-core processors and network-based computers. This leads to more complex simulation systems that are harder to configure correctly. We present an experimentation layer for the modeling and simulation framework JAMES II. It greatly facilitates the configuration and usage of the system for a user and supports distributed optimization, on-demand observation, and various distributed and non-distributed scenarios.
Conference Paper
Full-text available
Spatial phenomena attract increasingly interest in computa- tional biology. Molecular crowding, i.e. a dense population of macromolecules, is known to have a significant impact on the kinetics of molecules. However, an in-detail inspec- tion of cell behavior in time and space is extremely costly. To balance between cost and accuracy, multi-resolution ap- proaches offer one solution. Particularly, a combination of individual and lattice-population based algorithms promise an adequate treatment of phenomena like macromolecular crowding. In realizing such an approach, central questions are how to specify and synchronize the interaction between population and individual spatial level, and to decide what is best treated at a specific level, respectively. Based on an algorithm which combines the Next Subvolume Method and a simple, individual-based spatial approach, we will present possible answers to these questions, and will discuss first experimental results.
Conference Paper
Full-text available
Stochastic simulation algorithms (SSA) are popular methods for the simulation of chemical reaction networks, so that various enhancements have been introduced and evaluated over the years. However, neither theoretical analysis nor empirical comparisons of single implementations suffice to capture the general performance of a method. This makes choosing an appropriate algorithm very hard for anyone who is not an expert in the field, especially if the system provides many alternative implementations. We argue that this problem can only be solved by thoroughly exploring the design spaces of such algorithms. This paper presents the results of an empirical study, which subsumes several thousand simulation runs. It aims at exploring the performance of different SSA implementations and comparing them to an approximation via ?-Leaping, while using different event queues and random number generators.
Conference Paper
Full-text available
This paper presents a simulation algorithm for the stochastic pi-calculus, designed for the ecient simulation of biological systems with large numbers of molecules. The cost of a simulation depends on the number of species, rather than the number of molecules, resulting in a significant gain in eciency. The algorithm is proved correct with respect to the calculus, and then used as a basis for implementing the latest version of the SPiM stochastic simulator. The algorithm is also suitable for generating graphical animations of simulations, in order to visualise system dynamics.
Article
Full-text available
The algorithm selection problem [Rice 1976] seeks to answer the question: Which algorithm is likely to perform best for my problem? Recognizing the problem as a learning task in the early 1990's, the machine learning community has developed the field of meta-learning, focused on learning about learning algorithm performance on classification problems. But there has been only limited generalization of these ideas beyond classification, and many related attempts have been made in other disciplines (such as AI and operations research) to tackle the algorithm selection problem in different ways, introducing different terminology, and overlooking the similarities of approaches. In this sense, there is much to be gained from a greater awareness of developments in meta-learning, and how these ideas can be generalized to learn about the behaviors of other (nonlearning) algorithms. In this article we present a unified framework for considering the algorithm selection problem as a learning problem, and use this framework to tie together the crossdisciplinary developments in tackling the algorithm selection problem. We discuss the generalization of meta-learning concepts to algorithms focused on tasks including sorting, forecasting, constraint satisfaction, and optimization, and the extension of these ideas to bioinformatics, cryptography, and other fields.
Article
Full-text available
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Article
Full-text available
Modeling gene regulatory networks has, in some cases, enabled biologists to predict cellular behavior long before such behavior can be experimentally validated. Unfortunately, the extent to which biologists can take advantage of these modeling techniques is limited by the computational complexity of gene regulatory network simulation algorithms. This study presents a new platform-independent, grid-based distributed computing environment that accelerates biological model simulation and, ultimately, development. Applying this environment to gene regulatory network simulation shows a significant reduction in execution time versus running simulation jobs locally. To analyze this improvement, a performance model of the distributed computing environment is built. Although this grid-based system was specifically developed for biological simulation, the techniques discussed are applicable to a variety of simulation performance problems.
Article
Full-text available
A general method for combining existing algorithms into new programs that are unequivocally preferable to any of the component algorithms is presented. This method, based on notions of risk in economics, offers a computational portfolio design procedure that can be used for a wide range of problems. Tested by solving a canonical NP-complete problem, the method can be used for problems ranging from the combinatorics of DNA sequencing to the completion of tasks in environments with resource contention, such as the World Wide Web.
Article
Full-text available
In this paper we examine the different formulations of Gillespie's stochastic simulation algorithm (SSA) [D. Gillespie, J. Phys. Chem. 81, 2340 (1977)] with respect to computational efficiency, and propose an optimization to improve the efficiency of the direct method. Based on careful timing studies and an analysis of the time-consuming operations, we conclude that for most practical problems the optimized direct method is the most efficient formulation of SSA. This is in contrast to the widely held belief that Gibson and Bruck's next reaction method [M. Gibson and J. Bruck, J. Phys. Chem. A 104, 1876 (2000)] is the best way to implement the SSA for large systems. Our analysis explains the source of the discrepancy.
Article
Full-text available
Reactions in real chemical systems often take place on vastly different time scales, with "fast" reaction channels firing very much more frequently than "slow" ones. These firings will be interdependent if, as is usually the case, the fast and slow reactions involve some of the same species. An exact stochastic simulation of such a system will necessarily spend most of its time simulating the more numerous fast reaction events. This is a frustratingly inefficient allocation of computational effort when dynamical stiffness is present, since in that case a fast reaction event will be of much less importance to the system's evolution than will a slow reaction event. For such situations, this paper develops a systematic approximate theory that allows one to stochastically advance the system in time by simulating the firings of only the slow reaction events. Developing an effective strategy to implement this theory poses some challenges, but as is illustrated here for two simple systems, when those challenges can be overcome, very substantial increases in simulation speed can be realized.
Article
Full-text available
Recently, Gillespie introduced the tau-leap approximate, accelerated stochastic Monte Carlo method for well-mixed reacting systems [J. Chem. Phys. 115, 1716 (2001)]. In each time increment of that method, one executes a number of reaction events, selected randomly from a Poisson distribution, to enable simulation of long times. Here we introduce a binomial distribution tau-leap algorithm (abbreviated as BD-tau method). This method combines the bounded nature of the binomial distribution variable with the limiting reactant and constrained firing concepts to avoid negative populations encountered in the original tau-leap method of Gillespie for large time increments, and thus conserve mass. Simulations using prototype reaction networks show that the BD-tau method is more accurate than the original method for comparable coarse-graining in time.
Article
Full-text available
How cells utilize intracellular spatial features to optimize their signaling characteristics is still not clearly understood. The physical distance between the cell-surface receptor and the gene expression machinery, fast reactions, and slow protein diffusion coefficients are some of the properties that contribute to their intricacy. This article reviews computational frameworks that can help biologists to elucidate the implications of space in signaling pathways. We argue that intracellular macromolecular crowding is an important modeling issue, and describe how recent simulation methods can reproduce this phenomenon in either implicit, semi-explicit or fully explicit representation.
Article
It has been proposed, by E. W. Dijkstra and others, that the goto statement in programming langauge is a principal culprit in programs which are difficult to understand, modify, and debug. More correctly, the argument is that it is possible to use the goto to synthesize program structures with these undesirable properties. Not all uses of the goto are to be considered harmful; however, it is further argued that the “good” uses of the goto fall into one of a small number of specific cases which may be handled by specific language constructs. This paper summarizes the arguments in favor of eliminating the goto statement and some of the theoretical and practical implications of the proposal.
Book
The Handbook of Stochastic Methods covers systematically and in simple language the foundations of Markov systems, stochastic differential equations, Fokker-Planck equations, approximation methods, chemical master equations, and quatum-mechanical Markov processes. Strong emphasis is placed on systematic approximation methods for solving problems. Stochastic adiabatic elimination is newly formulated. The book contains the "folklore" of stochastic methods in systematic form and is suitable for use as a reference work.
Article
This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning. Comment: See http://www.jair.org/ for any accompanying files
Article
This is one of two companion papers presented at the 25th ACM National Conference in October 1972, a time when the debate about goto statements was reaching its peak. Indeed, so intense was the argument that the issue was considered to be almost separate and distinct from the concept of Structured programming. Wulf's viewpoint, reflected in the title of the paper, is that goto statements are dangerous, and should be avoided. Wulf admits that not all gotos are bad; as he says, " . . . this argument addresses the use of the goto rather than the goto itself." There are legitimate uses of the goto, but they are rare and can be eliminated altogether with proper high-level language constructs. One of Wulf's main themes is borrowed from Dijkstra, namely, that program correctness is becoming more and more important, and that it cannot be achieved by conventional testing. If proofs of correctness (either formal or informal) are the way of the future, then --- as Wulf illustrates with a small programming example --- it is essential that the code be written in a well-structured fashion. Regrettably, this still is an issue that most real-world programmers ignore: They argue that their programs are so complex that they can't develop correctness proofs regardless of whether their code is structured or unstructured; so they usually opt for the easiest coding approach, which (in languages like COBOL) may not be wellstructured at all. Wulf also demonstrates in this paper a mechanism for converting unstructured code into equivalent structured code. The method is taken directly from Bt:ihm and Jacopini, but is considerably easier to understand when Wulf explains it. Finally, Wulf addresses the practical possibility of eliminating the goto statement; he considers the two most common practical objections to be convenience and efficiency. Whether or not structured programming is convenient, he argues, is largely a function of the programming language. With suitable constructs to express the various forms of loops and decisions, together with some escape constructs to exit prematurely from the middle of a block structure, the goto is hardly ever missed. Here Wulf speaks from experience that few could claim in 1972: He and his colleagues already had been programming for three years in a systems implementation language called BLISS, a language that has no goto statement! Wulf's comment regarding efficiency has become a classic: "More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason -- including blind stupidity." He recognizes that there are applications or, more commonly, portions of applications in which efficiency is a valid issue, but maintains that the problem of efficiency is best left to optimizing compilers, a point with which most people agree today. wulrs final, and perhaps most effective, argument against the goto and in favor of well-structured code follows: In the long run, optimizing compilers will be able to generate considerably better object code for structured programs than for rat's-nest programs. The 1972 ACM Conference at which Wulf presented his paper was considerably more accessible than, say, the 1971 IFIP Conference in Yugoslavia. Certainly, a reasonable number of practicing industry-oriented programmers attended, and Wulf's paper must have had some impact on them, but his message reached only a very small percentage of the potential audience. Indeed, there are many programming shops today in which Wulf's paper is just as relevant as it was in 1972, shops in which debates about the goto statement still are being waged.
Article
There is a growing awareness for the need to understand the basic design principles of living systems. In May of 2005, a diverse group of researchers from the fields of biomedicine, physics, mathematics, engineering, and computer science were brought together in Mizpe Hayamim, Israel to contemplate the current and future trends in computational modeling of biology. In the following work we provide an overview of the discussions that took place and describe some of the research projects that were presented. We also discuss how these seemingly disparate efforts may be integrated and directed at the development of meaningful computational models of biological systems. The wide range of techniques presented in Mizpe Hayamim served to demonstrate not only the breadth of scale found in biology but also the diversity in criteria for the development and application of numerical models in the field. One of the key issues remains the reconciliation of different model types and their effective use as a single composite representation. By attempting to formulate a unifying theme that transcends traditional boundaries between disciplines, it is hoped that this workshop provided a first rallying point that will promote a new level of interaction and synergy in the field.
Chapter
Introduction Simulation Methods for Stochastic Chemical Kinetics Aspects of Biology — Genetic Regulation Parallel Computing for Biological Systems Parallel Simulations Spatial Modeling of Cellular Systems Modeling Colonies of Cells References
Article
There are two formalisms for mathematically describing the time behavior of a spatially homogeneous chemical system: The deterministic approach regards the time evolution as a continuous, wholly predictable process which is governed by a set of coupled, ordinary differential equations (the "reaction-rate equations"); the stochastic approach regards the time evolution as a kind of random-walk process which is governed by a single differential-difference equation (the "master equation"). Fairly simple kinetic theory arguments show that the stochastic formulation of chemical kinetics has a firmer physical basis than the deterministic formulation, but unfortunately the stochastic master equation is often mathematically intractable. There is, however, a way to make exact numerical calculations within the framework of the stochastic formulation without having to deal with the master equation directly. It is a relatively simple digital computer algorithm which uses a rigorously derived Monte Carlo procedure to numerically simulate the time evolution of the given chemical system. Like the master equation, this "stochastic simulation algorithm" correctly accounts for the inherent fluctuations and correlations that are necessarily ignored in the deterministic formulation. In addition, unlike most procedures for numerically solving the deterministic reaction-rate equations, this algorithm never approximates infinitesimal time increments dt by finite time steps Δt. The feasibility and utility of the simulation algorithm are demonstrated by applying it to several well-known model chemical systems, including the Lotka model, the Brusselator, and the Oregonator.
Article
The stochastic simulation algorithm (SSA) is an essentially exact procedure for numerically simulating the time evolution of a well-stirred chemically reacting system. Despite recent major improvements in the efficiency of the SSA, its drawback remains the great amount of computer time that is often required to simulate a desired amount of system time. Presented here is the “τ-leap” method, an approximate procedure that in some circumstances can produce significant gains in simulation speed with acceptable losses in accuracy. Some primitive strategies for control parameter selection and error mitigation for the τ-leap method are described, and simulation results for two simple model systems are exhibited. With further refinement, the τ-leap method should provide a viable way of segueing from the exact SSA to the approximate chemical Langevin equation, and thence to the conventional deterministic reaction rate equation, as the system size becomes larger.
Article
The performance of logical process based distributed simulation (DS) protocols like Time Warp and Chandy/Misra/Bryant is influenced by a variety of factors such as the event structure underlying the simulation model, the partitioning into submodels, the performance characteristics of the execution platform, the implementation of the simulation engine and optimizations related to the protocols. The mutual performance effects of parameters exhibit a prohibitively complex degree of interweaving, giving analytical performance investigations only relative relevance. Nevertheless, performance analysis is of utmost practical interest for the simulationist who wants to decide on the suitability of a certain DS protocol for a specific simulation model before substantial efforts are invested in developing sophisticated DS codes.Since DS performance prediction based on analytical models appears doubtful with respect to adequacy and accuracy, this work presents a prediction method based on the simulated execution of skeletal implementations of DS protocols. Performance data mining methods based on statistical analysis and a simulation tool for DS protocols have been developed for DS performance prediction, supporting the simulationist in three types of decision problems: (i) given a simulation problem and parallel execution platform, which DS protocol promises best performance, (ii) given a simulation model and a DS strategy, which execution platform is appropriate from the performance viewpoint, and (iii) what class of simulation models is best executed on a given multiprocessor using a certain DS protocol. Methodologically, skeletons of the most important variations of DS protocols are developed and executed in the N-MAP performance prediction environment. As a mining technique, performance data is collected and analyzed based on a full factorial design. The design predictor variables are used to explain DS performance.
Article
We compare two recently developed mesoscale models of binary immiscible and ternary amphiphilic fluids. We describe and compare the algorithms in detail and discuss their stability properties. The simulation results for the cases of self-assembly of ternary droplet phases and binary water-amphiphile sponge phases are compared and discussed. Both models require parallel implementation and deployment on large scale parallel computing resources in order to achieve reasonable simulation times for three-dimensional models. The parallelization strategies and performance on two distinct parallel architectures are compared and discussed. Large scale three-dimensional simulation of multiphase fluids requires the extensive use of high performance visualization techniques in order to enable the large quantities of complex data to be interpreted. We report on our experiences with two commercial visualization products: AVS and VTK. We also discuss the application and use of novel computational steering techniques for the more efficient utilization of high performance computing resources. We close the paper with some suggestions for the future development of both models.
Article
Calculations can naturally be described as graphs in which vertices represent computation and edges reflect data dependencies. By partitioning the vertices of a graph, the calculation can be divided among processors of a parallel computer. However, the standard methodology for graph partitioning minimizes the wrong metric and lacks expressibility. We survey several recently proposed alternatives and discuss their relative merits.
Conference Paper
Stochastic simulations may require many replications until their results are statistically significant. Each replication corresponds to a standalone simulation job, so that these can be computed in parallel. This paper presents a grid-inspired approach to distribute such independent jobs over a set of computing resources that host simulation services, all of which are managed by a central master service. Our method is fully integrated with alternative ways of distributed simulation in JAMES II, hides all execution details from the user, and supports the coarse-grained parallel execution of any sequential simulator available in JAMES II. A thorough performance analysis of the new execution mode illustrates its efficiency.
Conference Paper
The parallel simulation of biochemical reactions is a very interesting problem: biochemical systems are inherently parallel, yet the majority of the algorithms to simulate them, including the well-known and widespread Gillespie SSA, are strictly sequential. Here we investigate, in a general way, how to characterize the simulation of biochemical systems in terms of Discrete Event Simulation. We dissect their inherent parallelism in order both to exploit the work done in this area and to speed-up their simulation. We study the peculiar characteristics of discrete biological simulations in order to select the parallelization technique which provides the greater benefits, as well as to touch its limits. We then focus on reaction-diffusion systems: we design and implement an efficient parallel algorithm for simulating such systems that include both reactions between entities and movements throughout the space.
Conference Paper
Simulation replication is a necessity for all stochastic simulations. Its efficient execution is particularly important when additional techniques are used on top, such as optimization or sensitivity analysis. One way to improve replication efficiency is to ensure that the best configuration of the simulation system is used for execution. A selection of the best configuration is possible when the number of required replications is sufficiently high, even without any prior knowledge on simulator performance or problem instance. We present an adaptive replication mechanism that combines portfolio theory with reinforcement learning: it adapts itself to the given problem instance at runtime and can be restricted to an efficient algorithm portfolio.
Conference Paper
While simulationists devise ever more efficient simulation algorithms for specific applications and infrastructures, the problem of automatically selecting the most appropriate one for a given problem has received little attention so far. One reason for this is the overwhelming amount of performance data that has to be analyzed for deriving suitable selection mechanisms. We address this problem with a framework for data mining on simulation performance data, which enables the evaluation of various data mining methods in this context. Such an evaluation is essential, as there is no best data mining algorithm for all kinds of simulation performance data. Once an effective data mining approach has been identified for a specific class of problems, its results can be used to select efficient algorithms for future simulation problems. This paper covers the components of the framework, the integration of external tools, and the re-formulation of the algorithm selection problem from a data mining perspective. Basic data mining strategies for algorithm selection are outlined, and a sample algorithm selection problem from Computational Biology is presented.
Article
Publisher Summary The problem of selecting an effective algorithm arises in a wide variety of situations. This chapter starts with a discussion on abstract models: the basic model and associated problems, the model with selection based on features, and the model with variable performance criteria. One objective of this chapter is to explore the applicability of the approximation theory to the algorithm selection problem. There is an intimate relationship here and that the approximation theory forms an appropriate base upon which to develop a theory of algorithm selection methods. The approximation theory currently lacks much of the necessary machinery for the algorithm selection problem. There is a need to develop new results and apply known techniques to these new circumstances. The final pages of this chapter form a sort of appendix, which lists 15 specific open problems and questions in this area. There is a close relationship between the algorithm selection problem and the general optimization theory. This is not surprising since the approximation problem is a special form of the optimization problem. Most realistic algorithm selection problems are of moderate to high dimensionality and thus one should expect them to be quite complex. One consequence of this is that most straightforward approaches (even well-conceived ones) are likely to lead to enormous computations for the best selection. The single most important part of the solution of a selection problem is the appropriate choice of the form for selection mapping. It is here that theories give the least guidance and that the art of problem solving is most crucial.
Conference Paper
The multi-armed bandit problem for a gambler is to decide which arm of a K-slot machine to pull to maximize his total reward in a series of trials. Many real-world learning and optimization problems can be modeled in this way. Several strategies or algorithms have been proposed as a solution to this problem in the last two decades, but, to our knowledge, there has been no common evaluation of these algorithms. This paper provides a preliminary empirical evaluation of several multi-armed bandit algorithms. It also describes and analyzes a new algorithm, Poker (Price Of Knowledge and Estimated Reward) whose performance compares favorably to that of other existing algorithms in several experiments. One remarkable outcome of our experiments is that the most naive approach, the ε-greedy strategy, proves to be often hard to beat.
Article
Although experimental studies have been widely applied to the investigation of algorithm performance, very little attention has been given to experimental method in this area. This is unfortunate, since much can be done to improve the quality of the data obtained; often, much improvement may be needed for the data to be useful. This paper gives a tutorial discussion of two aspects of good experimental technique: the use of variance reduction techniques and simulation speedups in algorithm studies. In an illustrative study, application of variance reduction techniques produces a decrease in variance by a factor 1000 in one case, giving a dramatic improvement in the precision of experimental results. Furthermore, the complexity of the simulation program is improved from Θ mn /H n ) to Θ( m + n log n ) (where m is typically much larger than n ), giving a much faster simulation program and therefore more data per unit of computation time. The general application of variance reduction techniques is also discussed for a variety of algorithm problem domains.
Conference Paper
Developments in simulation systems, e.g. new simulators, partitioning algorithms, modeling formalisms, or specialized user interfaces, often implies the realization of entire simulation systems from scratch. This requires significant efforts, and, in addition, it hampers the evaluation of the results achieved. The proposed Plug'n simulate concept enables developers to integrate their ideas into an existing framework and thus eases the development and the evaluation of results.
Conference Paper
No simulation algorithm will deliver best performance under all circumstances, so simulation systems often offer execution alternatives to choose from. This leads to another problem: how is the user supposed to know which algorithm to select? The need for an automated selection mechanism is often neglected, as many simulation systems are focused on specific applications or modeling formalisms and therefore have a limited number of expert users. In general- purpose simulation systems like JAMES II, an 'intelligent' selection mechanism could help to increase the overall performance, especially when users have limited knowledge of the underlying algorithms and their implementations) (which is almost always the case). We describe an approach to integrate algorithm selection methods with such systems. Its effectiveness is illustrated in conjunction with the 'plug'n simulate' approach of JAMES II [12].
Conference Paper
For over a decade prophets have voiced the contention that the organization of a single computer has reached its limits and that truly significant advances can be made only by interconnection of a multiplicity of computers in such a manner as to permit cooperative solution. Variously the proper direction has been pointed out as general purpose computers with a generalized interconnection of memories, or as specialized computers with geometrically related memory interconnections and controlled by one or more instruction streams.
Article
Until recently, statistical theory has been restricted to the design and analysis of sampling experiments in which the size and composition of the samples are completely determined before the experimentation begins. The reasons for this are partly historical, dating back to the time when the statistician was consulted, if at all, only after the experiment was over, and partly intrinsic in the mathematical difficulty of working with anything but a fixed number of independent random variables. A major advance now appears to be in the making with the creation of a theory of the sequential design of experiments, in which the size and composition of the samples are not fixed in advance but are functions of the observations themselves.
Article
There are two fundamental ways to view coupled systems of chemical equations: as continuous, represented by differential equations whose variables are concentrations, or as discrete, represented by stochastic processes whose variables are numbers of molecules. Although the former is by far more common, systems with very small numbers of molecules are important in some applications, e.g., in small biological cells or in surface processes. In both views, most complicated systems with multiple reaction channels and multiple chemical species cannot be solved analytically. There are exact numerical simulation methods to simulate trajectories of discrete, stochastic systems, methods that are rigorously equivalent to the Master Equation approach, but they do not scale well to systems with many reaction pathways. This paper presents the Next Reaction Method, an exact algorithm to simulate coupled chemical reactions that is also efficient: it (a) uses only a single random number per simulation event, and (b) takes time proportional to the logarithm of the number of reactions, not to the number of reactions itself. The Next Reaction Method is extended to include time-dependent rate constants and non-Markov processes and it is applied to a sample application in biology: the lysis/lysogeny decision circuit of lambda phage. When run on lambda the Next Reaction Method requires approximately 1/15th as many operations as a standard implementation of the existing methods.
Article
This paper discusses efficient simulation methods for stochastic chemical kinetics. Based on the tau-leap and midpoint tau-leap methods of Gillespie [D. T. Gillespie, J. Chem. Phys. 115, 1716 (2001)], binomial random variables are used in these leap methods rather than Poisson random variables. The motivation for this approach is to improve the efficiency of the Poisson leap methods by using larger stepsizes. Unlike Poisson random variables whose range of sample values is from zero to infinity, binomial random variables have a finite range of sample values. This probabilistic property has been used to restrict possible reaction numbers and to avoid negative molecular numbers in stochastic simulations when larger stepsize is used. In this approach a binomial random variable is defined for a single reaction channel in order to keep the reaction number of this channel below the numbers of molecules that undergo this reaction channel. A sampling technique is also designed for the total reaction number of a reactant species that undergoes two or more reaction channels. Samples for the total reaction number are not greater than the molecular number of this species. In addition, probability properties of the binomial random variables provide stepsize conditions for restricting reaction numbers in a chosen time interval. These stepsize conditions are important properties of robust leap control strategies. Numerical results indicate that the proposed binomial leap methods can be applied to a wide range of chemical reaction systems with very good accuracy and significant improvement on efficiency over existing approaches.
Article
A framework is presented that captures the discrete and probabilistic nature of molecular transport and reaction kinetics found in a living cell as well as formally representing the spatial distribution of these phenomena. This particle or agent-based approach is computationally robust and complements established methods. Namely it provides a higher level of spatial resolution than formulations based on ordinary differential equations (ODE) while offering significant advantages in computational efficiency over molecular dynamics (MD). Using this framework, a model cell membrane has been constructed with discrete particle agents that respond to local component interactions that resemble flocking or herding behavioural cues in animals. Results from simulation experiments are presented where this model cell exhibits many of the characteristic behaviours associated with its biological counterpart such as lateral diffusion, response to osmotic pressure gradients, membrane growth and cell division. Lateral diffusion rates and estimates for the membrane modulus of elasticity derived from these simple experiments fall well within a biologically relevant range of values. More importantly, these estimates were obtained by applying a simple qualitative tuning of the model membrane. Membrane growth was simulated by injecting precursor molecules into the proto-cell at different rates and produced a variety of morphologies ranging from a single large cell to a cluster of cells. The computational scalability of this methodology has been tested and results from benchmarking experiments indicate that real-time simulation of a complete bacterial cell will be possible within 10 years.