ArticlePDF Available

In Search of Self-Organization

Authors:
In Search of Self-Organization
Dustin Lockhart Arendt
Dissertation submitted to the Faculty of the
Virginia Polytechnic Institute and State University
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
in
Computer Science and Applications
Yang Cao, Chair
James D. Arthur
Mark R. Paul
Narendran Ramakrishnan
Calvin J. Ribbens
March 15, 2012
Blacksburg, Virginia
Keywords: Complex Systems, GPGPU, Self-Organization, Dimer Automata
Copyright 2012, Dustin Lockhart Arendt
In Search of Self-Organization
Dustin Lockhart Arendt
(ABSTRACT)
Many who study complex systems believe that the complexity we observe in the world around
us is frequently the product of a large number of interactions between components following
a simple rule. However, the task of discerning the rule governing the evolution of any given
system is often quite difficult, requiring intuition, guesswork, and a great deal of expertise in
that domain. To circumvent this issue, researchers have considered the inverse problem where
one searches among many candidate rules to reveal those producing interesting behavior.
This approach has its own challenges because the search space grows exponentially and
interesting behavior is rare and difficult to rigorously define. Therefore, the contribution
of this work includes tools and techniques for searching for dimer automaton rules that
exhibit self-organization (the transformation of disorder into structure in the absence of
centralized control). Dimer automata are simple, discrete, asynchronous rewriting systems
that operate over the edges of an arbitrary graph. Specifically, these contributions include a
number of novel, surprising, and useful applications of dimer automata, practical methods for
measuring self-organization, advanced techniques for searching for dimer automaton rules,
and two efficient GPU parallelizations of dimer automata to make searching and simulation
more tractable.
Dedication
For those who seek the truth of our universe with the hope they may comprehend it.
For my grandfather, Luther Lockhart, inspiration and role model.
iii
Acknowledgments
Briefly, I would like to thank Christina for sticking around through this long journey, my
parents for their support and encouragement, and my advisor, Yang Cao, for taking a chance
a few years ago on a former student he barely knew.
iv
Contents
1 Introduction 1
1.1 CellularAutomata ................................ 4
1.2 Random Boolean Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 LindenmayerSystems............................... 8
1.4 Asynchronous Variants of Synchronous Models . . . . . . . . . . . . . . . . . 9
1.5 AsynchronousModels............................... 12
1.6 DimerAutomata ................................. 13
1.6.1 Extensions................................. 16
1.6.2 Serial Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7 Representing Space with a Graph . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7.1 Dimension & Isotropy . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.7.2 Generating Isotropic 2-D Lattices . . . . . . . . . . . . . . . . . . . . 21
2 Applications 25
2.1 Flocculation or Diffusion Limited Aggregation . . . . . . . . . . . . . . . . . 25
2.2 Domain Coarsening & Grain Growth . . . . . . . . . . . . . . . . . . . . . . 28
2.3 SchellingSegregation............................... 30
2.4 Excitable Media & Spiral Waves . . . . . . . . . . . . . . . . . . . . . . . . . 31
v
2.5 Epidemics ..................................... 34
2.6 GraphAlgorithms................................. 37
3 Implementations 40
3.1 Background & Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Many Mid-Sized Dimer Automata . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2.1 Methods.................................. 45
3.2.2 Results................................... 48
3.2.3 Discussion................................. 50
3.2.4 Conclusions ................................ 53
3.3 One Large Dimer Automaton . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.1 GPU Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.3.2 Performance Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3.3 Discussion................................. 64
3.3.4 Conclusions ................................ 70
4 Investigations 72
4.1 Detecting Self-Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.2 Exhaustive Search of |Σ|=3........................... 76
4.3 Searching with Evolutionary Motifs . . . . . . . . . . . . . . . . . . . . . . . 79
4.3.1 FindingMotifs .............................. 79
4.3.2 EvolvingRules .............................. 81
4.3.3 Experimental Setup & Results . . . . . . . . . . . . . . . . . . . . . . 83
4.3.4 Discussion................................. 84
4.4 Conclusions .................................... 88
vi
5 Generalizations 89
5.1 Measuring Structure in Large Rules . . . . . . . . . . . . . . . . . . . . . . . 92
5.2 ElasticDimerAutomata ............................. 93
5.3 Searching Elastic Dimer Automata . . . . . . . . . . . . . . . . . . . . . . . 96
5.4 Discussion..................................... 102
5.4.1 Rendering................................. 102
5.4.2 Distribution of Behaviors . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.5 Conclusions .................................... 104
6 Conclusions 107
Bibliography 111
vii
List of Figures
1.1 Elementary cellular automata exhibiting Class III and IV behavior. Space is
shown vertically, and time advances from left to right. . . . . . . . . . . . . . 6
1.2 The dragon fractal, an example of a 2-D spatial pattern formed by a Linden-
mayerSystem.................................... 10
1.3 Diagrammatic representation of the dimer automaton rule that swap the state
oftwovertices. .................................. 14
1.4 (a) the twenty neighborhoods necessary to search when inserting a random
point. The blue lines indicate the reach of points placed at the corners of the
center blue grid cell. The 20 grid cells that overlap with these are shaded, and
red dots indicate points already added to the grid. (b) an example output of
the lattice generated using the RSA+Delaunay technique. . . . . . . . . . . . 22
1.5 Isotropy of a von Neumann (i.e square) lattice compared to a lattice created
using RSA and Delaunay triangulation. . . . . . . . . . . . . . . . . . . . . . 23
2.1 Typical spatial pattern formed by diffusion limited aggregation. . . . . . . . 27
2.2 Grain growth occurs when the state space of the domain coarsening model
is expanded. Coarsening produces smooth, curved boundaries, whereas grain
growth results in short straight boundaries mimicking a Voronoi partition. . 30
2.3 Segregation of blue and red phases is due to the white regions acting as a
buffer. Due to mass conservation, the amount of this buffer determines the
total perimeter, and thus the length scale of the features seen. . . . . . . . . 32
viii
2.4 Spiral waves generated from the excitable media rule are surprisingly smooth
and circular, and do not reveal the underlying structure of the lattice. . . . . 34
2.5 Bifurcation diagram and an example time series for the epidemic model. . . . 36
3.1 Arranging kernel calls into blocks as shown allows helps the GPU to avoid
becomingidle.................................... 48
3.2 Comparison of throughput for algebraic and finite state machine representa-
tions of dimer automata rules for GPU and CPU implementations. . . . . . . 49
3.3 Increasing the number of kernel invocations per block causes the GPU algo-
rithm to reach maximum throughput sooner (with fewer threads). . . . . . . 51
3.4 Increasing the number of states (the rule size) also has a detrimental impact
onperformance................................... 52
3.5 As the structure of the graph transitions from uniform to random, efficiency
improves. ..................................... 54
3.6 Row iof the matching matrix Mon a small network corresponding to the
matching on the network shown. . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.7 Rendering of the Hilbert curve on a 32 ×32 grid. Line segments are color
coded according to their position in the curve. . . . . . . . . . . . . . . . . . 60
3.8 Speedup attained by the GPU implementation. For the excitable media model
the GPU algorithm performs 80x faster than the serial algorithm. . . . . . . 62
3.9 The Hilbert ordering contributes to a speedup of a factor greater than 2x. . . 63
3.10 Sampling from sets of maximal matchings compared to the true binomial
probability mass function. Algorithm 7 (i.e. with sorting) is much closer to
the true distribution than sampling from independently generated maximal
matchings...................................... 67
3.11 The error of sampling from a finite set of matchings is measured in relation
to the number of matchings used. Error is determined as the sum squared
difference between the binomial distribution and the distribution resulting
from sampling from a finite set of matchings. . . . . . . . . . . . . . . . . . . 68
ix
4.1 Structure exists between pure uniformity and randomness, making entropy a
poor measurement of this quantity. . . . . . . . . . . . . . . . . . . . . . . . 74
4.2 Scatter plot L(1) versus µfor all 3,330 isomorphically unique rules in the
search space defined by |Σ|=3.......................... 77
4.3 Rules and corresponding outputs for the 3 outliers in Figure 4.2. Note that
the rules are shown here diagrammatically as finite state machines. An edge
(x, y) in this diagram means that the rule matrix has the form R(x, ) = y
where is each of the labels of that particular edge. . . . . . . . . . . . . . . 78
4.4 K-means clustering of behavior space after 60 generations. Each data point
measure’s the behavior of a dimer automaton rule. . . . . . . . . . . . . . . . 85
4.5 K-means cluster centers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.6 The effects of point mutation on the diversity the population over time. . . . 86
4.7 The effects of point mutation on the diversity of the final generation. . . . . 87
5.1 Generalization of the domain coarsening and excitable media rule to increasing
sizes. As states are added the resulting behavior is “sharpened.” . . . . . . . 91
5.2 Joint probability distribution (top) before and after simulation of excitable
media(bottom)................................... 94
5.3 The 89 isomorphically unique, strongly connected, directed graphs with 4
or fewer vertices. These graphs are then stretched twice according to the
rewritingrules. .................................. 98
5.4 Information-Energy time series of rules 13, 18, 55, and 85 with (top) and
without (bottom) swapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.5 Rule (left) and corresponding elastic dimer automaton 13, 18, 55, and 85
output allowing (right) and not allowing (middle) swapping at local energy
minima. ...................................... 101
5.6 Stretching increases the level of detail without qualitatively affecting its be-
havior (rule 85 without swapping is shown). . . . . . . . . . . . . . . . . . . 102
5.7 A comparison of two rendering techniques for dimer automata. . . . . . . . . 103
x
5.8 Clustering of behavior resulting from the 5137 elastic dimer automata with
|V0| ≤ 5 ...................................... 105
xi
List of Tables
3.1 Optimal effectiveness of matrix/equation GPU acceleration . . . . . . . . . . 50
3.2 GPU-CPU comparisons and observed throughput and speedup. . . . . . . . 61
4.1 Non-dominated motifs positively correlated with local structure in |Σ|= 3 . 83
xii
Chapter 1
Introduction
Complexity can arise from the interactions between a large number of simple components
over time. When modeling such systems, the underlying physics causing those interactions
can sometimes be abstracted and simplified, but still qualitatively reproduce the observed
phenomenon. This makes abstract models for complex systems valuable, as simple expla-
nations for phenomena divorced from their context can be shared easily among scientists in
vastly different domains. This approach is central to the study of complex systems, which has
revolutionized the way we understand the world over the course of the past several decades.
Over this period of time, several different and rigorously defined modeling frameworks such
as cellular automata have matured and become widely used. Cellular automata are a popular
discrete and deterministic way to model spatiotemporal phenomena using very simple rules.
The study of cellular automata (and the science of complex systems in general) is interested
in the open question, “given some particular phenomenon, what is the best rule to reproduce
its behavior?” This is the traditional “engineering” approach, which often requires a high
degree of experience, intuition, and guesswork to yield a significant improvement over existing
models. Despite this, the potential benefit of developing new or better models of complex
systems causes a great deal of effort to be invested in this approach.
Another question also of great interest is, “given a simple rule, what behavior does that rule
produce?” This seems trivial, as one can simply run a computer program to find out the
result. However, it is challenging, perhaps even impossible, to determine this in any easier
way (for example, by using a formula to predict the configuration of the system at any given
time). Precise statements about the outcome of a system based solely on its rule may only
1
Dustin Lockhart Arendt Chapter 1. Introduction 2
be possible for fairly trivial cases. Similar challenges face related fields such as dynamical
systems, where sensitive dependence on initial conditions (i.e., positive Lyapunov exponents)
create a “horizon of predictability.”
For these reasons, several novel methodologies have been invented to help understand com-
plex systems. These approaches are new to science and were unimaginable before the advent
and widespread use of computers. Namely, it is possible to use a computer to conduct a
large number of experiments in a short amount of time. Furthermore, the abstract nature
modern models for complex systems allows the model itself to be treated as a parameter.
With discrete models, especially, the entire space of models (or a subset thereof) can be
enumerated, simulated, and evaluated. In other words, computers imbue us with capacity
to consider a large number of models (i.e., rules, equations, programs) with the goal of dis-
covering interesting models that we would not be likely to invent on our own. This is the
main thesis of Stephen Wolfram’s “A New Kind of Science” [110].
In the early 1980s Wolfram introduced this approach by exhaustively searching elementary
cellular automata where he discovered four distinct classes of behavior [109]. These behaviors
can be considered discrete analogs of the various phases of matter, solid, liquid, and gas,
with very interesting phenomena appearing in the transitions between these regimes. Related
experiments have been repeated for other classes of complex systems as well, with similar
results (e.g., random boolean networks [46]). But, this approach has not yet produced the
widespread or drastic changes to scientific and research methodologies originally promised.
Not surprisingly, change is occurring more incrementally; researchers have had moderate
success when the search objective is narrowed significantly. For example, many researchers
have have used optimization techniques to find the single rule most adept at accomplishing a
specific computational task including density classification, parity checking, synchronization,
and pseudo-random number generation [30, 78, 34, 59, 111, 104].
The next step is to broaden the search criteria enough to allow discovery of “interesting
behavior” without knowing the specific form or purpose of the behavior ahead of time. This
is challenging since the property we wish to search for must be broadly defined to cover
many types of phenomena while simultaneously narrowly defined to be measurable by a
computer algorithm. Therefore, we focus the search such that the “interesting behavior” we
search for is “self-organization.” Unfortunately, there still are many different and sometimes
contradictory definitions of self-organization [10, 47, 38]. For this work we define it to mean
the transformation of a random configuration to a structured one without centralized control.
Dustin Lockhart Arendt Chapter 1. Introduction 3
Thus, the study of self-organization and the study of complex systems goes hand in hand.
The decentralized and multi-agent nature of models such as cellular automata make them
an ideal testbed for researching self-organization.
However, beyond simply decentralizing control of the system, a multi-purpose modeling and
simulation framework for the study of complex systems and self-organization should have
the following qualities:
Robustness: patterns form despite noise in the system;
Generality: the topology of space may change without modifying the rule; and
Simplicity: the neighborhood and the number of states are minimal.
For many classical modeling frameworks (e.g., cellular automata, boolean networks, cou-
pled map lattices), one can also find variants where rules are applied asynchronously (one
at a time, as opposed to the more common case, all at once) [94, 42, 75, 80]. The updat-
ing order has a significant influence on the observed dynamics; switching from synchronous
to asynchronous updates can alter or destroy interesting behavior [94, 35]. Random asyn-
chronous updating often prevents the formation of spatial structure and self-organization.
Thus, patterns that form despite random asynchronous updates are exhibiting robustness.
Another drawback of many classical models for dynamical systems results from the tight
coupling between the rule and the topology of the system. Cellular automata, sequential
dynamical systems, and random boolean networks can all function on non-uniform lattices.
However, in order to apply the rule uniformly, all neighborhoods must have the same size.
An exception to this is if the rule is simplified to become totalistic. Totalistic rules compute
the next state of a given vertex as a function of itself and the average state of the neighboring
vertices [92]. However, some information about the neighborhood’s configuration is lost by
this averaging. Thus, there appears to be a tradeoff between the space and the rule; either
space is simplified to accommodate a complex rule, or the rule is simplified to accommodate
a complex space. In either case, these restrictions create arbitrary requirements that reduce
the generality or simplicity of that model.
Dimer automata [95] overcome these restrictions by applying updates asynchronously to a
single edge in an arbitrary graph. Each vertex is assigned a state; updating an edge may
change the state of both endpoints of that edge simultaneously. The random asynchronous
Dustin Lockhart Arendt Chapter 1. Introduction 4
updates introduce sufficient noise into the system so that patterns forming in dimer automata
can be considered robust. Any topology that can be represented by a graph can be used
by any dimer automaton rule without modification, making dimer automata sufficiently
general. For the same reason, the neighborhood always consists of exactly two vertices,
making dimer automata even simpler than elementary cellular automata. Therefore, for this
work we consider the dimer automata modeling framework exclusively.
The remainder of this chapter reviews the literature on several classic discrete modeling
frameworks, dimer automata, and relevant topics. This chapter is followed by a discussion
of several real world applications to further motivate dimer automata in §2. Next, §3 presents
two GPU parallelizations for dimer automata demonstrating that the challenges resulting
from their asynchronous nature and non-uniform topology can be overcome with a simple
and efficient implementation. Results from this chapter will be published in [5] and are
under peer review in [8]. Chapters §4 and §5 consider effective ways of searching for dimer
automata rules that exhibit self-organization. These results are currently under peer review
in [7, 6].
1.1 Cellular Automata
Cellular automata are dynamical systems with discrete time, space, and state. They can
be considered purely discrete versions of partial differential equations, which are continuous
in time, space, and state [110]. In fact, explicit numerical methods for integrating partial
differential effectively result in the evolution of a cellular automaton [82]. Many different
notations exist to describe cellular automata and their evolution; these notations, while
mathematically rigorous, may obfuscate the underlying simplicity of the model for some
readers. Therefore, our description of cellular automata and other models in this section is
descriptive, but not rigorous.
A cellular automaton consists of the following two basic components: its current configura-
tion, and a rule specifying how to generate the next configuration. Generating successive
configurations is akin to observing the evolution of the system over time. The configuration
is partitioned into discrete units called cells, and each cell has a single state from a discrete
state space. Since it is impractical to define the transition for every possible configuration,
the rule is simplified to consist of the synchronous application of a local rule to every cell in
Dustin Lockhart Arendt Chapter 1. Introduction 5
the configuration. In other words, for each cell, we compute its next state as a function of
that cell’s neighborhood. Generally, cellular automata assume a uniform lattice, so that the
neighborhood of any given cell is trivial to determine.
One of the simplest possible cellular automata assumes a 1-D lattice with the neighborhood
including the cells to the left and right of each cell as well as the center cell itself; each cell
has a state of 0 or 1. Cellular automata of this form are referred to as elementary cellular
automata. Since there are 23= 8 different possible configurations of a given neighborhood,
then there are 28= 256 possible local rules. In one of the most famous experiments in the
field, Stephen Wolfram examined space-time plots for every possible rule [109]. He observed
many rules quickly resulting in a uniform or simple periodic state, which was not surprising.
However, some rules, even when initialized with trivial initial conditions, produced behavior
with long transients and/or periods, which he designated as Class III and IV automata.
A set of configurations belonging to a repeating sequence of configurations is often referred
to as an attractor. The transient length is the number of iterations before the automaton
falls into its attractor. All cellular automata eventually exhibit periodicity, since there is
a finite number of configurations resulting from their discrete, deterministic nature. Class
III automata exhibit chaotic or random appearing behavior. Class IV automata exhibit a
mixture of Class II and Class III behaviors, often manifesting as dislocations or particles
interacting non-trivially on a simpler background made of small local structures. Figure 1.1
shows the evolution of elementary cellular automaton rule 110 exhibiting this surprising
behavior. It was eventually proven that this rule exhibits universality, or in other words, can
execute any algorithm provided that the input/output are properly encoded/decoded into
the configuration of the automaton [26].
Many variants of cellular automata exist beyond the elementary 1-D cellular automata dis-
cussed previously. Rules that are insensitive to the spatial arrangement of neighboring states
are called totalistic rules [110, 92]. Often these rules simply consider the total number of
neighboring white or black cells. Many 2D and 3D cellular automata use totalistic rules and
still display complex behavior and patterns [110]. The “Game of Life” was one of the first
widely studied totalistic cellular automata because of the remarkably complex behavior of
local structures [25]. Crystal growth is simple to model with a totalistic rule that allows
growth at the edge of the crystal as long as some exact number of cells neighboring an
empty cell are filled [110]. Cellular automata that model excitable media are also totalistic
because it doesn’t matter what direction the wave of excitation comes from, only that it the
Dustin Lockhart Arendt Chapter 1. Introduction 6
(a) rule 30 (class III)
(b) rule 110 (class IV)
Figure 1.1: Elementary cellular automata exhibiting Class III and IV behavior. Space is
shown vertically, and time advances from left to right.
conditions for excitation are met. Essentially, systems that have no orientational preference
(a form of isotropy) can be modeled using totalistic rules, and there are a large number of
such systems.
A variant of the traditional cellular automaton is the block cellular automaton, which is useful
for modeling systems of colliding particles [92]. In such automata, the lattice is partitioned
into an number of blocks having equal shape, and the rule defines how to update the states
of an entire block synchronously, instead of just a single state. For example, the Margolus
neighborhood, considers blocks as 2 ×2 squares (or in 3-D, as 2 ×2×2 cubes) on a square
lattice, but the blocks are shifted after each iteration [74]. This technique has been used
to model diffusion, thermalization, diffusion limited aggregation, reflection and refraction,
Ising spin systems, and a number of other physical phenomena.
Another type of block cellular automaton is the Lattice Gas Cellular Automaton (LGCA)
[90]. LGCA’s are used to model the equations of hydrodynamics, and can even be used to
derive those equations from the rules and geometry of the LGCA. Additionally, LGCA’s
are beginning to be used for biological pattern formation [33]. The basic premise behind
the LGCA is that the general properties of fluids are not really dependent on the size or
configuration of the underlying molecule that makes up that fluid; grains of sand or ball
bearings at a large enough scale have the same general properties as water [90]. Thus, the
Dustin Lockhart Arendt Chapter 1. Introduction 7
LGCA which is a purely discrete model based on boolean logic can hopefully, at large enough
scales, be equivalent to PDE’s such as the Navier-Stokes equations. LGCA allow each cell
in the lattice to have multiple velocity channels. This allows more than one particle to be in
any given cell at a given time, but no two particles can have the same velocity. All particles
have the same constant speed, it is only the direction that differs in the velocities, and these
directions are dependent on the underlying structure of the lattice. A hexagonal lattice is
most often used because such a lattice allows the LGCA to be isotropic. At a large scale
the underlying discreteness of the lattice is washed out, and the system behaves as if it was
continuous. Furthermore, there is no detectable preference in the overall system towards any
of the six directions possible in the lattice. Because of the properties of LGCA’s, a great deal
of theoretical and analytical effort has gone into showing their equivalence to the traditional
continuous hydrodynamic models.
1.2 Random Boolean Networks
Random boolean networks can be considered as generalizations of cellular automata. Boolean
networks were originally conceived by Stuart Kauffman to be a simple model of gene inter-
actions [65]. Boolean networks require that every neighborhood is the same size, but the
topology may be non-uniform. Instead, boolean networks operate over a directed graph with
the assumption that every vertex has the same degree. Secondly, the local rule is not applied
uniformly; each neighborhood may update the center cell according to a different, randomly
chosen rule. Like with elementary cellular automata, it is assumed that each cell has a state
of either 0 or 1. Random boolean networks are therefore parameterized by N, K, p, where N
is the number of nodes (i.e. genes) in the network, and Kis the number of connections per
node (i.e. the number of genes that determine each other gene’s behavior). The probability
of a 0 in the rule table for each gene is parameterized by p.
With this model, one question Kauffman hoped to answer was whether the biochemical
pathways in cells are finely tuned systems, or if the behavior could be easily accomplished
by accident. He found that at a certain threshold of connectivity (e.g. K= 2), that many
random networks were capable of short cycles, which would be biologically useful, compared
to completely connected networks which have extremely long cycles. Furthermore, he sug-
gested that the cellular differentiation (i.e. different cellular behaviors given the same genes)
was a consequence of multiple distinct cycles within the same random boolean network. Re-
Dustin Lockhart Arendt Chapter 1. Introduction 8
cent advancements in gene sequencing has allowed the mapping of actual gene and protein
interaction networks. Kauffman’s original random model has been vetted by considering
actual biochemical pathways as boolean networks [64, 70].
Much work has also been devoted to understanding the properties of random boolean net-
works outside of their biological context. For example it is well known that for K2 most
networks are ordered, and for K3 most networks are chaotic, so a phase transition must
occur in between [46]. Derrida and Pomeau proved that this transition occurs at exactly
K= 2 [32]. The ordered/frozen phase in random boolean networks is analogous to Wolfram’s
complexity Class I and II; the chaotic phase is analogous to Class III; and the critical phase
is analogous to Class IV. One limitation of the N-K model is that each gene is assumed to be
controlled by exactly K other genes, which is not a realistic biological assumption. Aldana
considered a more realistic case (i.e. scale free topology) and showed that in this case the
chaotic phase dominates less of the phase diagram [3]. From this it is hypothesized that
scale free networks may be more advantageous to evolutionary biological systems, or that
these systems may naturally evolve into a scale free topology.
Much effort has been focused in understanding the configuration spaces of random boolean
networks [46]. Since random boolean networks are deterministic, each configuration has
exactly one next state. The configurations can be considered vertices in a directed graph
with edges representing the transitions. Traversing this graph from any initial configuration
will eventually lead to a point attractor (a configuration whose next state is itself) or an
attractor basin (a cycle in the configuration space graph). Some interesting questions one
may ask are, how many attractors (or basins) are there? How long does it take to reach an
attractor on average? How large are the basins on average? For example, in the random
boolean network analysis of the yeast cell cycle, it was found that the vast majority of
configurations quickly led to just one point attractor corresponding to the cell’s quiescent
state [70]. Furthermore, the majority of configurations followed a single large pathway closely
mirroring the dynamics of the actual yeast cell cycle.
1.3 Lindenmayer Systems
Lindenmayer systems (L-systems) were originally conceived as a simple discrete technique
to model the growth of cellular filaments (e.g. algae) over time [73]. L-systems differ most
Dustin Lockhart Arendt Chapter 1. Introduction 9
significantly from models previously discussed because modeling the growth of cells produces
a dynamic topology. The simplest L-systems are deterministic and context free and are called
DOL-systems [89]. In DOL-systems, the production rule defines how to replace a single
character in a string with a new pattern (which may consist of 0,1, or more characters).
The rule is repeatedly and synchronously applied to each element in the initial string. The
output of this can be interpreted as instructions to a turtle graphics program to produce
a visual representation of the L-system. A classic example of this is shown in Figure 1.2.
Combining L-systems with turtle graphics allows L-systems to model various edge and node
rewriting schemes used to define many geometric fractals. Many variations of L-systems
exist including stochastic, context sensitive cases. Stochastic L-systems allow more than one
production rule for a given input character, with the rule chosen randomly according to a
specified probability distribution. Context sensitive L-systems allow the production rule to
examine nearby characters to determine what operation to perform, and it is straightforward
to implement an elementary cellular automaton as a context sensitive L-system. Context-
sensitive L-systems were used to design a simple self-replicating structure, which is a classical
cellular automaton problem [100].
1.4 Asynchronous Variants of Synchronous Models
Classical cellular automata and boolean networks, as discussed thus far, assume synchronous
(i.e. parallel) updating, where the next state of each cell is computed for each cell simultane-
ously. In some contexts, however, it may be more realistic to assume that the system is less
organized, and updates to cells occur one at a time, in some particular order. For example,
if a cellular automaton is used in a biological context to model cell growth, an iteration of
the automaton may roughly correspond to the duration of the cell cycle. It is more realistic
to assume that cell divisions occur in some particular order, rather than synchronized to an
artificial clock, as the former approach is less granular. So, it is reasonable to ask what effect
does switching from synchronous to asynchronous updates have on a particular model? This
question has been explored for elementary cellular automata [58, 36, 37], and the result is
somewhat disheartening. While asynchronous updates do not significantly affect the behav-
ior of Class I or II automata, it appears to have a significant and detrimental affect on Class
III and IV automata. Specifically, the asynchronous updates appear to significantly reduce
the transient and period lengths of the automata, which were among their defining char-
Dustin Lockhart Arendt Chapter 1. Introduction 10
Figure 1.2: The dragon fractal, an example of a 2-D spatial pattern formed by a Lindenmayer
System.
Dustin Lockhart Arendt Chapter 1. Introduction 11
acteristics. From this, a legitimate hypothesis is that the complex behavior of elementary
cellular automata is an artifact of the updating scheme, making these models less relevant
to real world phenomena.
The effect of asynchronous updates has been studied in several specific cellular automata as
well. A famous example is the spatially extended version of the Prisoner’s Dilemma [86]. This
is a classic problem from game theory where two individuals can choose to either cooperate
or defect, and the payoff is the greatest if one defects and one cooperates, moderate if both
cooperate, and the least if both defect. The game is spatially extended by allowing each cell
to adopt the strategy of its best performing neighbor (i.e., to always cooperate or always
defect). By fine tuning the payoff values of the game, very interesting spatial patterns of
cooperation and defection are observed. However, when the same experiments were repeated
using asynchronous updating, the system quickly evolved into a homogenous state with each
individual choosing to defect [57].
Another set of experiments considered the effect of asynchronous updates on the Game of
Life and the Immune Network Model [18]. In this case the interesting dynamics were also
destroyed by asynchronous updates. While the Game of Life normally results in complex
spatiotemporal behavior with the possibility for small perturbations to affect the entire
board, this was not the case with asynchronous updates. The automaton quickly transitions
into a frozen state, more closely resembling Class I behavior than Class IV. Similar experi-
ments have been conducted with nearly the same result; that asynchronous updates destroy
interesting patterns formed under synchronous updates [94].
Additionally, the order of updates in the asynchronous case can also have a significant im-
pact on the observed dynamics. One can imagine a number of ways to perform the updates
including line-by-line sweep, fixed random sweep, random new sweep, uniform choice, and
exponential waiting times [94]. The uniform choice and exponential waiting time were found
to be the most similar, and introduce the fewest artifacts into the experiments. In fact,
the methods are equivalent in the limit as the size of the lattice increases to infinity, but
exponential waiting time is often more amenable to analytical use. The effects of addi-
tional update schemes, motivated by several real world examples, are used to explore several
different types of elementary cellular automata [28]. These experiments also showed that
random asynchronous updating tended to decrease transient and period length, simplifying
the observed dynamics.
Dustin Lockhart Arendt Chapter 1. Introduction 12
In the above cases, switching from synchronous to asynchronous updating destroyed all
interesting behavior. However, it would be a grave error to assume that simple asynchronous
systems are capable of no interesting behavior at all. In many cases, physical phenomena
are actually more accurately modeled by asynchronous updates; an excellent example of this
is the pattern of pigments on mollusk shells [112]. Asynchronous cellular automata have
also been used to implement signal transduction networks for the purpose of performing
arbitrary computation [1]. Furthermore, it has even been proven that any synchronous
cellular automaton can be emulated by an asynchronous one [83]. In a more recent study,
the sensitivity (or robustness) to asynchronous updating was quantified and measured in a
reexamination of the elementary cellular automata [35]. This work demonstrates robustness
to asynchronous updating is found in a few rules belonging to each complexity class. So, an
important point is not that asynchronous updating prevents useful work (e.g., computation,
pattern formation, synchronization, self-organization), but that certain rules can be quite
sensitive to changes in the order of and mechanism for updates.
1.5 Asynchronous Models
A close variant of random boolean networks are sequential dynamical systems [80, 52]. These
systems assume an undirected graph where each vertex is assigned a single state in {0,1}.
Each vertex has a unique rule to compute its next state as a function of itself and its
neighbors. However, unlike random boolean networks and cellular automata, there is no
restriction on the degree of each vertex; the graph may have any topology. This complicates
matters in terms of how the rule can be implemented, so only symmetric boolean functions
such as nor, nand, xor, parity, and majority are considered. These functions are totalistic,
so they can be defined generically in terms of any sized neighborhood [80]. Furthermore,
as their name suggests, sequential dynamical systems are updated sequentially instead of
synchronously; the next state of each vertex is computed one at a time. The order in
which the vertices are updated is defined by a permutation of the vertex set, which is
determined ahead of time. The same order is repeatedly used in each iteration, making
sequential dynamical systems deterministic. It was shown that under certain conditions,
some sequential dynamical systems’ behaviors are both non-trivial and independent of the
update schedule [52]. However, in the general case, the update schedule plays a significant
role in the observed dynamics, potentially creating or destroying different attractors for
Dustin Lockhart Arendt Chapter 1. Introduction 13
the same rule. This analytical result conforms with experimental results for asynchronous
cellular automata [94].
Interacting particle systems are much like asynchronous cellular automata or sequential dy-
namical systems [72]. One assumes some topology (e.g. a uniform square lattice is common),
and each vertex is assigned a state in {0,1}. The dynamics of the system is governed by
simple rules that change the state of a cell based on its neighbors, which are referred to
in this context as events. However, unlike models previously discussed, the likelihood of a
particular event occurring is dependent on the configuration of the system. There are sev-
eral classical models for interacting particles systems including voter, contact, and exclusion
models. In the original voter model, each state represents a region in control by a specific
faction, and one region is taken over by a neighboring region with a probability based on
the number of neighbors belonging to the opposing faction [24]. The important questions
are, will the system reach consensus (i.e. a uniform state), and how long does this take? It
was shown that in 1 or 2 dimensions the system will reach consensus, but for 3 or more this
is not the case because random walks in higher dimensions are not guaranteed to converge.
Contact processes were invented to model the spread of disease between neighboring indi-
viduals on a lattice [54]. In this model, infections spread in a manner similar to the voter
model. However, infections can die out with a certain probability too. It was found that
a phase transition occurs and is dependent on the rate of infection. In other words, if the
disease is not infectious enough, it will die out, but there is some critical value where the
disease spreads throughout the entire population. Exclusion processes can be used to model
traffic flow [81, 19]. The distribution of particles (e.g. vehicles) remains fixed over time,
and the system evolves by allowing particles to hop from one location to another nearby
location. These models can be used to understand the phase transition occurring when the
traffic density increases. Higher traffic density can create waves of slow or stopped vehicles,
reducing the throughput of the road.
1.6 Dimer Automata
Formally, dimer automata consist of (G, Σ, X, R) where G= (V, E) is an undirected graph
(with vertex set Vand edge set E) defining the spatial topology of the system; Σ is a set of
allowable states; Xis the configuration of the system (xt
iis the state of vertex iat a discrete
time tand xt
iΣ, i V); Ris a rule function (R: Σ27→ Σ). If we consider discrete Σ,
Dustin Lockhart Arendt Chapter 1. Introduction 14
then Ris a finite state machine (FSM), and can be represented as a |Σ| × |Σ|matrix. An
example of this state transition matrix is,
R(x, y) =
x\y0 1
0 0 1
1 0 1
,(1.1)
which defines the output of Rfor every possible pair of inputs. Note that x, y, R(x, y)Σ.
Dimer automaton rules can also be represented as algebraic equations or as diagrammatic
representations. The algebraic representation of the above rule would be
R(x, y) = y, (1.2)
and its corresponding diagrammatic representation is shown in Figure 1.3. In the diagram-
matic case, the edge between two vertices xand zlabeled with ydenotes that R(x, y) = z.
In other words, xtransitions to zwhen xis next to y. The diagram is simplified by omitting
self-loops (i.e., edges where R(x, y) = x).
0
1
1
0
Figure 1.3: Diagrammatic representation of the dimer automaton rule that swap the state
of two vertices.
A dimer automaton is iterated by picking an edge
uv Euniformly at random and applying
the rule Rto xt
uand xt
v. Updating is sequential (i.e. asynchronous); only one edge may be
picked and updated in a single time step. The next states of xt
uand xt
vare computed
simultaneously according to
"xt+1
u
xt+1
v#="R(xt
u, xt
v)
R(xt
v, xt
u)#.(1.3)
Dustin Lockhart Arendt Chapter 1. Introduction 15
This is a simplification of the original dimer automata model where R: Σ27→ Σ2. Equa-
tion 1.3 forces edge updates to be symmetric, causing the result to be independent of the
order in which the endpoints of the edge are passed to the rule. This simplification results
from the assumption that Gis undirected. Thus, the example rule above results in the state
of xand ybeing swapped. This is an elegant way to implement diffusion of particles.
Besides obvious similarities with cellular automata and boolean networks, dimer automata
have some commonalities with L-systems [73] and exclusion processes (a type of interacting
particle system) [72]. While L-systems and dimer automata are both forms of rewriting sys-
tems, L-systems commonly employ a dynamic topology. Exclusion processes model particles
moving throughout an arbitrary graph. Only one particle can exist in a vertex at a time,
and particles move to an empty adjacent vertex according to some probability distribution.
Thus, the rules governing exclusion processes change the state of two vertices at a time as
well. The essential difference between exclusion processes (also interacting particle systems)
is that the dynamics of the system is more directly controlled by the probability distribution
governing when a rule is applied, instead of the rule itself with dimer automata.
Given some discrete Σ, there are |Σ||Σ|2possible rules because the rule must map every
possible pair of inputs in Σ2to one output in Σ . However, within any rule space, we may
wish to organize the rules into isomorphism classes to reduce the size of the search space. A
rule R1belongs to the same isomorphism class as another rule R2if there exist a bijection
function b: Σ 7→ Σ such that b(R1(b(x), b(y))) = R2(x, y) for all (x, y)Σ2. In other words,
two rules belong to the same isomorphism class if there is some way to permute the vertices
and relabel the edges of R1that results in R2. Since there are |Σ|! unique bijections, each
isomorphism class can have at most |Σ|! members. This is only an upper bound because two
different permutations of a rule may in fact be the same. If i(Σ) is the set of isomorphism
classes of rules in Σ, then the cardinality of this set is lower bounded by
|i(Σ)| ≥ |Σ||Σ|2
|Σ|!.(1.4)
This bound is useful because knowing at most how much a rule space can be reduced by
considering isomorphism classes can assist in the decision as to whether an exhaustive search
in the smaller space is even tractable.
Dustin Lockhart Arendt Chapter 1. Introduction 16
1.6.1 Extensions
From interacting particle systems we have seen that a probability distribution controlling
when updates occur can affect the dynamics of the system in useful and interesting ways.
Dimer automata can be similarly augmented using the concept of propensity, which draws
from Gillespie’s Algorithm for stochastic chemical kinetics [49]. We assume that for each
edge
uv we can compute P(u, v) where P:E7→ {0,<+}. The propensity may also rely
on the current state of the system Xand the topology G, in which case Pwould change
dynamically over time. The probability that an edge (u, v) is picked at time tis
P r[
uv is picked] = P(u, v)
P(x,y)EP(x, y).(1.5)
Sampling from and updating this probability distribution can be accomplished efficiently
using essentially the same techniques for improving Gillespie’s Algorithm [48, 23, 99].
In some special cases the rule Rmay have a form that allows for a more efficient paralleliza-
tion. If a particular state transition defined by Rchanges the state of at most one endpoint,
then that transition is “separable” and Ris “partially separable.” If all possible transitions
defined by Rare separable, then that rule is “fully separable,” and it is not necessary to ap-
ply both updates from Equation 1.3 atomically. For reasons discussed later, fully separable
rules allow a more efficient parallelization.
It was previously assumed that Gis undirected to allow Rto be simplified for reasons relating
to symmetry. However, it is reasonable in some cases to use a directed graph in combination
with rules that break symmetry. For example, the rule
R(x, y) =
x\y0 1
0 (0,1) (1,1)
1 (0,0) (1,0)
,(1.6)
clearly breaks symmetry because (0,0) (0,1) and (1,1) (1,0). Note that x, y Σ but
R(x, y)Σ2since it is necessary to define both outputs for every possible pair of inputs.
The updating rule use is modified to reflect this, so if
uv was the directed edge chosen to be
updated, then
Dustin Lockhart Arendt Chapter 1. Introduction 17
"xt+1
u
xt+1
v#=R(xt
u, xt
v).(1.7)
In other words, the rule function is only called once per update since the rule returns both
states. For consistency, symmetry breaking rules should always be used with the convention
that the state of the source vertex is the first argument of R, and the state of the destination
vertex is the second. In an undirected graph there would be no way to distinguish source
from destination, and thus no consistent way to implement this convention.
Usually we are interested in exploring the dynamics of a single rule at a time, so we assume
that rule is applied uniformly over space. However, there may be cases where we wish to
apply one rule to one location, and a different one to another. Recall that this was the
case with random boolean networks, where each cell was assigned a random rule according
to a probability distribution. Consider a protein network where protein A may activate
protein B but inhibit protein C, etc. An effective model would handle the A-B and A-C
interactions according to a different rule. A flexible solution is to allow each edge to contain
a state variable from a separate state space Σe. In this example, we could let Σe={+,−},
corresponding to activation and inhibition. However, the rule Rbecomes more complex as it
would have the form R: Σ2×Σe7→ Σ2×Σe. We suppose this additional complexity would
need to be motivated by an actual real world problem, so we do not consider this as part of
the general model.
1.6.2 Serial Implementations
It is straightforward to iterate dimer automata, and the most basic algorithm is shown in
Algorithm 1. There is no need for any sophisticated data structures or algorithms. The
configuration of the system Xand the rule can be stored in a 1-D array. The graph should
be represented by an edge list (i.e. two parallel 1-D arrays), with one array corresponding
to the source vertex and the other corresponding to the destination. This facilitates proper
sampling of edges, whereas a data structure such as an adjacency list might incorrectly
introduce biases towards sampling edges belonging to vertices with high degree. Edges can
be sampled using standard random number generators such as lrand48 from the Standard
C Library.
Dustin Lockhart Arendt Chapter 1. Introduction 18
Algorithm 1: Serial implementation of a dimer automaton
Input:T, G(V, E ),Σ, X, R
Output:X
for t=1..T do1
(u, v) := E[lrand48()%|V|];2
xu:= X[u];3
xv:= X[v];4
X[u] := R[xu, xv];5
X[v] := R[xv, xu];6
end7
An efficient optimization may be to maintain a list of “active” edges (i.e. the edges that
when updated change the state of the system). Only edges in the active set are chosen to
be updated. When a particular edge is updated all edges sharing a vertex with any of that
edge’s endpoints are examined to determine if they became active or inactive. This is shown
in detail in Algorithm 2. Note that this algorithm keeps track of time because updating
edges from the active implies that a certain number of inactive edges were also updated (but
these edges were skipped since they has no effect on the system).
1.7 Representing Space with a Graph
A major advantage of dimer automata is the capability to use any graph to represent the
local interactions in the system. To model continuous space, it is common to use a square
or hexagonal uniform lattice. The regularity of these lattices allows for a fast and simple
implementation on a computer, but can introduce artifacts that become visible at the macro
level [93]. A randomized lattice is a suitable representation of space that avoids the pitfalls
resulting from uniformity. Here we discuss dimension and isotropy as two ways to measure
the effectiveness in which a graph models space, followed by a discussion of how to produce
such a graph.
1.7.1 Dimension & Isotropy
Intuitively, for a graph to be a suitable discretization of continuous space, the graph should
have the same dimensionality as that space. However, it is not immediately clear how to
Dustin Lockhart Arendt Chapter 1. Introduction 19
Algorithm 2: Simulates a Dimer Automaton, but dynamically tracks the set of active
edges.
Input:T, G(V, E ), X, R
Output:X
t:= 0;1
active := E;2
while t<T do3
t:= t+|E|
|active|;
4
(u, v) := edge chosen u.a.r. from active;5
{Xu, Xv}:= {R(Xu, Xv), R(Xv, Xu};// apply the rule6
foreach (i, j)E:{i, j}∩{u, v} 6=do // each edge touching (u, v)7
p:= (Xi=R(Xi, Xj)Xj=R(Xj, Xi));8
if p(i, j)/active then9
active := active (i, j); // edge became active10
end11
if ¯p(i, j)active then12
active := active (i, j); // edge became inactive13
end14
end15
end16
Dustin Lockhart Arendt Chapter 1. Introduction 20
define the dimension of a graph in this context. Suppose we have some d-dimensional real
space. A hyper-cube in this space with sides of length rwill have volume of rd. Thus, the
volume follows a power law parameterized by d, which is exactly the dimension of the space,
i.e.
V(r)rd.(1.8)
This concept can be extended to graphs by defining the volume V(r) as the average number
of vertices reachable in ror fewer steps [84, 98]. For example, a square lattice with Moore
neighborhoods has d= 2 since V(0) = 1, V (1) = 9, V (2) = 25, ..., V (r) = (2r+ 1)2. Thus
a square lattice approximates 2-D space, since the lattice has a volume scaling exponent of
2. Long range connections in graphs (e.g. small world and scale free properties) cause the
diameter of the network to scale according to O(log |V|) [101]. This implies that
V(r)er,(1.9)
which grows faster than any power law [84]. One interpretation is that small world networks
have very high or infinite dimension. Thus, while small world networks may not be good
approximations of low dimensional space, they may serve as good representations of fast
mixing systems.
Graphs representing low dimensional continuous spaces should also be isotropic. In other
words, the measurement of various properties of that graph should be independent of the
angle of the measurement. Therefore, we consider an isotropic lattice to be one in which the
ratio of euclidean to topological (i.e. shortest path) distances between vertices is independent
of the absolute angle between those vertices. Consider an example some of us may experience
everyday: navigating in a city grid. If one’s destination lies on the same street as the starting
point, then the euclidean (i.e. “as the crow flies”) distance and topological (i.e. the route
one must navigate) are identical. This is not the case, however, if the starting point and
destination lie diagonal across the grid. In this case, the euclidean distance is significantly
less than the topological distance. The city grid in this example is a square lattice, which is
clearly anisotropic.
We define lattice isotropy as the ratio of the euclidean distance Eto the topological dis-
tance Tas they depend on the angle θbetween pairs of vertices in the lattice, thus I(θ) =
E(θ)/T (θ).For the von Neumann lattice, if we pick some arbitrary point a distance (x, y)
Dustin Lockhart Arendt Chapter 1. Introduction 21
away from a reference point, then
I(x, y) = px2+y2
|x|+|y|.(1.10)
Converting to polar coordinates, we have
I(θ) = 1
|cos θ|+|sin θ|.(1.11)
Thus, the disparity between euclidean and topological distances in the von Neumann lattice
is heavily dependent on the orientation of the lattice.
1.7.2 Generating Isotropic 2-D Lattices
For our simulations, we use an isotropic two dimensional lattice created using the technique
discussed by Kansal, et. al., [63]. This creates a non-uniform lattice through the use of
random sphere packing [27] and Delaunay triangulation [14]. RSA was chosen over uni-
formly random generated points because RSA results in a more evenly spaced lattice whose
properties better mimic an actual volume of packed atoms (i.e. a real world surface) [27].
Furthermore, the RSA lattice reduces the variance in the distribution of distances between
neighboring vertices as well as in the distribution of the area of the triangles produced by
the Delaunay triangulation. Intuitively, the dimension of this lattice is 2, because the points
generated by RSA are in <2and because Delaunay triangulation produces a planar graph,
so there are no “shortcuts” (e.g. long range edges) that would increase the dimension of the
space.
An efficient algorithm to generate a set of 2-D points by RSA is shown in Algorithm 3.
This algorithm makes use of a space partitioning data structure, referred to as grid in the
algorithm. For simplicity, we assume that at most one point can be located in each grid cell.
Let r=2 be the minimum allowable distance between points. Assuming each grid cell has
unit length, this is the smallest value that still prevents two points from sharing a single grid
cell. The algorithm loops over all grid cells in a random order, attempting to add a random
point that is not within rof any other point. Because of the grid spacing, it is necessary to
only check the points in the neighboring twenty grid cells as show in Figure 1.4(a). These
are the grid cells that a circle of radius roverlaps when placed at each corner of a single
Dustin Lockhart Arendt Chapter 1. Introduction 22
grid cell. If no point can be added to a given grid cell after τattempts, the grid cell is
left blank and the algorithm continues on to the next randomly chosen cell. The simplicity
of this algorithm results from the assumption that the minimum allowable radius between
points is uniform throughout space. Relaxing this constraint would require use of a more
sophisticated space partitioning data structure. Figure 1.4(b) shows the lattices generated
using the RSA + Delaunay technique.
(a) space partitioning grid (b) sample output
Figure 1.4: (a) the twenty neighborhoods necessary to search when inserting a random point.
The blue lines indicate the reach of points placed at the corners of the center blue grid cell.
The 20 grid cells that overlap with these are shaded, and red dots indicate points already
added to the grid. (b) an example output of the lattice generated using the RSA+Delaunay
technique.
In the case of the RSA lattice, we measure I(θ) by picking a vertex close to the center of
the lattice and measuring the average I(θ) for all other vertices within a given radius. The
radius was chosen so that points near the boundaries of the lattice are omitted, as these
regions contain some artifacts of the triangulation. Figure 1.5 compares the isotropy of the
von Neumann lattice to the RSA lattice, showing the advantage of the RSA lattice. Unlike
the von Neumann lattice, where the isotropy is best at θ=nπ/2, there is no obvious bias
towards any angle in the RSA lattice. It is also clear from the figure that the variation of I
is much smaller for the RSA lattice compared to the von Neumann lattice.
Dustin Lockhart Arendt Chapter 1. Introduction 23
Figure 1.5: Isotropy of a von Neumann (i.e square) lattice compared to a lattice created
using RSA and Delaunay triangulation.
Dustin Lockhart Arendt Chapter 1. Introduction 24
Algorithm 3: Fills a n×ngrid using random sphere packing. The function nearby(i, j)
refers to the set of 20 neighboring grid cells shown in Figure 1.4(a). The output is a
set of points P[0, n)2.
Input: dimension of grid n; number of retries τ
Output:P
grid := n×narray of points, each initialized to (,);1
indices := {0,2, ..., n 1}2shuffled;2
for (i, j)indices do3
for k= 1..τ do4
test := rrandom point ([i, i + 1),[j, j + 1));5
if (x, y)nearby(i, j)|test grid[x, y]|>2then6
grid[i, j] := test;7
break;8
end9
end10
end11
P:= the points in the grid not equal to (,);12
Chapter 2
Applications
The previous chapter introduced dimer automata and provided an argument for their use in
searching for self-organization over traditional models such as cellular automata. However,
one might be concerned that dimer automata are too simple to be a useful general purpose
modeling framework, so this chapter serves, in part, to strengthen that argument by present-
ing many simple dimer automaton models directly related to real world phenomena. It also
serves as a central point to consolidate many of the interesting results found during the course
of this research. We present models for flocculation, domain coarsening and grain growth,
Schelling segregation, excitable media, epidemics, and several graph algorithms. The floccu-
lation model is adapted directly from a similar model in the literature. The basic models for
grain growth and excitable media models were found by a simple search for self-organization
discussed in §4.2 and generalized by a process discussed in §5. The Schelling segregation
model was designed by adapting the domain coarsening and grain growth model to exhibit
conservation of mass. The epidemic model is a variant of the excitable media model, and
adds a decay mechanism and explicitly defines excited and resting states.
2.1 Flocculation or Diffusion Limited Aggregation
Sometimes diffusing particles collide and adhere to form a dendritic cluster with fractal
dimension. This process is known as flocculation or diffusion limited aggregation. The Eden
model is a simple lattice based Monte Carlo technique that simulates this process [108].
In this model a single particle is introduced far away from the cluster and allowed move
25
Dustin Lockhart Arendt Chapter 2. Applications 26
randomly to neighboring lattice sites until it reaches the cluster. When the particle reaches
the cluster, it changes state, stops moving, and becomes part of that cluster. An example
of the output produced by this simulation is shown in Figure 2.1. The process is repeated
to grow the cluster to an arbitrary size. This process can be succinctly described with the
dimer automaton rule
R(x, y) =
yif x+y= 1
2else if x+y= 3
xelse
(2.1)
where x, y ∈ {0,1,2}. Empty space has a state of 0, free particles have a state of 1, and
the cluster has a state of 2. Diffusion occurs by swapping a free particle with empty space.
Aggregation occurs when a free particle adjacent to the cluster changes its state to 2.
The original Eden model is inherently serial because it assumes just one free particle exists,
analogous to the limit as the density of free particles approaches zero. This model can be
made more amenable to parallelization by assuming there is some positive density of particles,
ρ. In this case, the simulation is initialized with n=ρ· |V|free particles placed in random
locations throughout the lattice and with the center vertex given a state of 2. Algorithm 4
demonstrates a serial implementation of this many-particle model. The gist of this approach
is to maintain a list of the free particles, and perform the Eden algorithm independently for
each particle. This was the basis for a past parallelization of this phenomena [66].
Algorithm 4: Serial algorithm for multi-particle flocculation on an arbitrary graph.
Input:T, G(V, E ), particles, X
Output:X
t:= 0;1
while t<T do2
foreach iparticles do3
j:= random neighbor of i;4
if Xj=T rue then5
X(i) = T rue;6
remove ifrom particles;7
else8
move ito j;9
end10
end11
t:= t+|V|/2;12
end13
Dustin Lockhart Arendt Chapter 2. Applications 27
Figure 2.1: Typical spatial pattern formed by diffusion limited aggregation.
Dustin Lockhart Arendt Chapter 2. Applications 28
2.2 Domain Coarsening & Grain Growth
Domain coarsening is observed in materials science, social science, cellular biology, and
other contexts. This process causes a configuration to become more organized through the
growth of competing homogenous regions separated by distinct boundaries. Coarsening oc-
curs because larger domains tend to absorb smaller ones. In materials science a domain can
correspond to crystals of particular orientations growing in metals at appropriate tempera-
tures, or gas bubbles in a soap froth separated by thin membranes [50, 44]. There are many
different models that can be used for domain coarsening and grain growth, but the Potts
model, a generalization of the Ising model, is popular [44]. The model defines a probability
distribution over all possible configurations based on the energy of each configuration and
the temperature of the system. The energy is related to the total number non-equal adjacent
spins (i.e., the perimeter of the system). To solve the Potts model, the Metropolis algorithm
is used to flip random spins in a manner that is consistent with the probability distribution
over configurations.
A cellular automaton that produces spatial patterns resembling grain growth phenomena is
the Voronoi cellular automaton [2]. However, several important differences exist to differen-
tiate this model from true grain growth models including the generalized dimer automaton
presented here. Grain growth models undergo scaling where, for example, the distribution of
grain diameters over time follows a power law [50]. The Voronoi cellular automaton does not
scale; once the diagram is created it remains permanently fixed. Furthermore, the Voronoi
diagram produced depends precisely on the centers specified in the input to the automaton.
The Voronoi cellular automaton has a direct and predictable relationship between the initial
and final configuration. Since the partition is directly determined by the user’s initial config-
uration, the Voronoi cellular automaton does not exhibit self-organization. Another cellular
automaton that results in domain coarsening is the two dimensional binary cellular automa-
ton with totalistic code 976 [110]. However, it is unclear how this rule can be extended to
an arbitrary number of states to model grain growth.
Dustin Lockhart Arendt Chapter 2. Applications 29
A 3-state dimer automaton rule for domain coarsening is
R(x, y) =
x\y0 1 2
0 0 1 2
1 1 1 0
2 2 0 2
,(2.2)
whose corresponding output is shown in Figure 2.2(a). A rigorous analytical treatment of
how this rule produces domains with smooth borders is beyond the scope of this paper, but
we offer a simple explanation of the basic mechanism. Domains are formed by contiguous
regions of vertices with state 1 or 2. When an edge linking two vertices having states 1 and
2 is updated, those states will both become 0. Subsequently, the states will return to 1 or
2 depending on the states of the vertices they neighbor, and the order in which those edges
are updated. A 0 surrounded by mostly 1’s is more likely to become 1, for example. The
lower the curvature of the boundary, the more stable it is, as the number of 1’s and 2’s are
more balanced. Thus, there is a statistical trend from high to low curvature, reducing the
total length of all boundaries akin to the energy minimization in the Ising/Potts models.
To generalize this rule towards grain growth, we let any two neighboring, non-equal, nonzero
vertices become 0, and any 0 vertex assumes the state of its neighbor, thus
R(x, y) =
yif x= 0
0else if x6=yx, y 6= 0
xelse
,(2.3)
where x, y Z. As the number of unique nonzero states in the input increases, the phe-
nomenon transitions from coarsening to grain growth. This transition is discussed further
in §5. We also ran a simulation with x0
i=iso that each vertex initially belongs to a unique
domain, and this is shown in Figure 2.2(b). The output produced by the dimer automaton is
visually similar to that of statistical mechanics models for grain growth and soap froths. In
the dimer automaton model there is no concept of energy or temperature, as opposed to the
Potts model. Furthermore, the dimer automaton rule is simpler as it considers interactions
a pair at a time, whereas the Metropolis algorithm must consider the entire neighborhood
of a vertex. A benefit of this is that the time complexity of the algorithm is not affected by
the density of the graph.
Dustin Lockhart Arendt Chapter 2. Applications 30
We believe this simple rule predicts the existence of an intermediate state that facilitates the
jumping from one adjacent domain to another. In materials science, this may correspond to
the existence of a short lived, difficult to observe phase in the material. Or, in a sociological
context, then the 0 state may correspond to a person being undecided or confused, and
susceptible to influence by others. We hope that these hypotheses can be confirmed by
others in future studies. We also hope that this rule can be the starting point for new
algorithms for distributed consensus of multi agent systems, density classification, graph
partitioning, clustering, and other difficult problems.
(a) domain coarsening: Σ = {0,1,2}(b) grain growth: Σ = Z
Figure 2.2: Grain growth occurs when the state space of the domain coarsening model is
expanded. Coarsening produces smooth, curved boundaries, whereas grain growth results in
short straight boundaries mimicking a Voronoi partition.
2.3 Schelling Segregation
The geographical clustering of similar people in urban areas was first understood using
the Schelling segregation model [91]. In its original, simplest form, space is represented as a
regular lattice where each cell contains an individual agent of one of two types, or it is empty.
The system is iterated by allowing unhappy agents to move to the nearest suitable empty
location. An agent’s mood, and the suitability of a location are determined by measuring the
Dustin Lockhart Arendt Chapter 2. Applications 31
fraction of like-typed agents among the k-nearest neighbors of that agent. Surprisingly from
such simple, microscopic rules, macroscopic patterns emerge. Individuals of the same type
will arrange themselves into clusters using empty space to buffer themselves from dissimilar
individuals. In addition to the original economic implications, Schelling segregation is also an
interesting model for describing physical phenomena in metals such as ripening, coarsening,
and surface diffusion [105]. These authors point out that there is a significant difference
between the Schelling model and related statistical mechanics models (e.g., Ising models).
In the Schelling model, individuals jump from one location to another. This causes the
number of individuals of each type to remain constant as the simulation progresses.
We can capture the qualitative behavior of the Schelling segregation model with the dimer
automaton rule
R(x, y) =
yif x, y 0
xelse if (x > 0x6=|y|)x=−|y|
xelse
,(2.4)
where x, y Z. Empty space is represented with a 0; agents of type 1 have state 1 or -1;
agents of type 2 have state 2 or -2; and so on. In this manner, the mood of an agent is
encoded in the agent’s sign: happy agents have a positive state, whereas unhappy agents
have a negative state. Unhappy agents are allowed to move into empty space or to swap
places with another agent that also has a negative outlook. An agent changes their mood
if they are unhappy but neighboring a like-typed agent, or happy but neighboring an unlike
agent. This model is simpler than other Schelling segregation models as there is no concept
of “utility.” An agent doesn’t know how many like or unlike agents it is neighboring, nor
does it know if moving will improve its situation. Surprisingly, this simplified model is still
able to reproduce some key features associated with Schelling segregation, an example of
which is shown in Figure 2.3.
2.4 Excitable Media & Spiral Waves
An excitable medium is a material or system that allows waves of uniform amplitude to
propagate without dissipation; waves completely annihilate when they collide. Excitable
media can produce “spiral waves” in 2-D lattices or facilitate synchronization in complex
Dustin Lockhart Arendt Chapter 2. Applications 32
Figure 2.3: Segregation of blue and red phases is due to the white regions acting as a buffer.
Due to mass conservation, the amount of this buffer determines the total perimeter, and
thus the length scale of the features seen.
Dustin Lockhart Arendt Chapter 2. Applications 33
networks [68]. Such behavior has been observed in a wide variety of fields including biology,
chemistry, cosmology, ecology, sociology, and epidemiology. The Greenberg-Hastings model
[51] is a simple model that is also a good starting point for a qualitative understanding
of excitable media. This model is a cellular automaton with three states that represent
excited, refractory, and resting phases. In each time step, resting cells neighboring excited
cells become excited; excited cells become refractory; and refractory cells become resting.
There have been many adaptations of this basic model, including the forest fire model [11],
cyclic particle systems [21], and cyclic cellular automata [41]. Many PDE’s exist to model
excitable media, such as the Barkley model [15]. PDE’s can be difficult to implement and
slow to run, motivating the development of simpler, faster, and more accessible models. An
interesting approach is to discretize the phase space of the PDE, essentially converting it into
a cellular automaton [45, 82]. Randomness was added to a discretization of the Belousov-
Zhabotinsky phase space to produce isotropic spirals without using large neighborhoods, as
this is more efficient [85].
A simple dimer automaton rule that reproduces the qualitative behavior of the Belousov-
Zhabotinsky rotating spiral waves is
R(x, y) = (x+ 1 if 1yxz
xelse mod n, (2.5)
where x, y, z ∈ {0,1,2, .., n 1}. This rule is very closely related to the rule for cyclic particle
systems [21]. This rule allows a vertex’s state to rotate by one unit mod nif the state of
a neighboring vertex is between 1 and z, units further ahead. States move in one direction
only, resembling the phase trajectories of classical models for excitable media, although there
is no clearly designated resting or excited states. In experiments we observed that increasing
nreduces the granularity of the simulation by increasing the length scale of the spiral waves,
or equivalently, decreasing their curvature. Increasing zalso stabilizes the wave fronts by
decreasing the likelihood that a wave can pass by a given vertex without altering that vertex’s
state.
The spirals which can be seen in Figure 2.4, do not show artifacts of the underlying lattice
as is the case with cyclic cellular automata, where spirals take on square shapes when using
a von Neumann lattice. In general, the quality of these results are surprising since the model
does not make use of large neighborhoods or deterministic updates. The dimer automaton
rule discussed here could serve as the basis for simple, robust, and decentralized algorithms
Dustin Lockhart Arendt Chapter 2. Applications 34
for the synchronization of multi-agent systems and other challenging real-world problems.
Figure 2.4: Spiral waves generated from the excitable media rule are surprisingly smooth
and circular, and do not reveal the underlying structure of the lattice.
2.5 Epidemics
A simple way to model how disease spreads through populations uses compartmental mod-
els [56]. Such models partition (or compartmentalize) a population into distinct sets and
Dustin Lockhart Arendt Chapter 2. Applications 35
model the movement of individuals between these sets. These models are formulated using
differential equations under the assumption that the population is sufficiently large as to
be considered continuous. Another, less realistic assumption is that the population is well
mixed, meaning that every individual in the population interacts equally with every other
member. In other words, there is no spatial component to the basic compartmental epidemi-
ological models. Nevertheless, these models provide a good starting point in understanding
how to model the spread of disease.
The simplest compartmental model, the SIR model is developed in detail below. First we
assume that all individuals in the population are either susceptible to, infected with, or
recovered from a specific disease. The variables S,I, and Rcorrespond to the number of
individuals in each category, and that S(t), I (t), R(t) is the number of susceptible, infected,
and recovered individuals at time t. Assume that S(t) + I(t) + R(t) = N, or that the
total population remains constant, and that S(0), I(0), R(0) 0. The system is modeled by
considering the rate in which individuals move into and out of each compartment. The rate
of infection is dependent on the population of susceptible and infected, hence the βIS term.
The recovery rate only depends on the infected population, so it is simply νI. Individuals
move from susceptible to infected, and then infected to recovered, thus the complete model
is
dS
dt =βI S,
dR
dt =νI,
dI
dt =βI S νI .
These equations form the basic SIR model. When these three differential equations are
numerically solved for reasonable starting conditions (i.e. S(0) = N, I(0) = 1, R(0) = 0 for
some large N) the following behavior is observed. As time passes, the infected population
gradually increases in a bell shaped curve, then decreases. Additionally the susceptible
population decreases while the recovered population increases monotonically. Eventually
the population consists entirely of recovered individuals. This is not realistic since epidemics
often exhibit more complex phenomena, including oscillatory behavior.
More complex compartmental ODE’s have been constructed to introduce additional epidemi-
ological behaviors, but there are also simple individual based models that are interesting to
Dustin Lockhart Arendt Chapter 2. Applications 36
study. Individual based models can make use of the spatial distribution of individuals to
produce behaviors difficult for ODE’s to recreate [68]. The following individual based dimer
automaton rule for epidemics,
R(x, y) = (x1if (x= 0 y > nβ)x > 0
xelse mod n, (2.6)
has several interesting behaviors. An individual becomes infected if they are susceptible (i.e.,
their state goes from 0 to n1) and if they are neighboring an infectious individual (i.e.
y > nβ). Thus 1 defines the duration an individual remains infectious, and defines
the duration an individual remains removed (i.e., immune to infection). Exploring the effect
of βon the dynamics of the system results in some interesting observations which can be
seen in Figure 2.5(a). The most important observation is that there appears to be a critical
value of βthat causes the outcome of the system to be very unpredictable. This is due to the
random long range edges allowing the system to globally synchronize; the system oscillates
smoothly between high and low populations of infected individuals rather than converging
to a fixed ratio of infection. This is visible in the time series shown in Figure 2.5(b), where
the mean state of the dimer automaton varies smoothly between extremes.
(a) average outcome (b) time series
Figure 2.5: Bifurcation diagram and an example time series for the epidemic model.
Dustin Lockhart Arendt Chapter 2. Applications 37
2.6 Graph Algorithms
Dimer automata provide a generic way to describe computation on graphs. It is natural to
ask what traditional graph algorithms can be implemented as dimer automata. Therefore,
we discuss dimer automata rules for several classical problems:
1. component labeling,
2. shortest path,
3. vertex coloring, and
4. topological sorting.
The efficiency of these algorithms is dependent on the topology of the graphs used as input.
The component labeling and shortest path algorithms depend on front propagation to com-
pute the result. Ideally the front will cover a significant portion of the vertices at any given
time and will move across the entire graph quickly. This is the case for high dimensional
networks, especially small world networks. However, for low dimensional networks, the di-
ameter can be large, causing the algorithm to converge more slowly. In this case, since the
front represents only a small fraction of the total vertices, a large fraction of the vertices are
inactive. Algorithm 2 is an effective serial approach to this problem.
Given a graph, we may wish to know whether there is a path from ato bfor all pairs of
vertices. This is equivalent to labeling the components; if a path exists between aand b
then these two vertices must belong to the same component. A simple way to label all the
components in a graph is to first assume each vertex belongs to its own unique component.
Then, pick an edge in the graph; and check if the endpoints of this edge are labeled as
belonging to different components. If this is so, then one of the endpoints can be re-labeled to
place both endpoints in the same component. This is accomplished by the dimer automaton
rule
R(x, y) = min(x, y).(2.7)
where x, y V, and x0
i=i. Eventually all vertices within the same component are labeled
with the smallest vertex index in that component. Component labeling is a precursor to a
slightly harder problem, computing the shortest path distance from a source vertex to all
Dustin Lockhart Arendt Chapter 2. Applications 38
other vertices in the graph (SSSP). This is accomplished by
R(x, y) = min(x, y + 1),(2.8)
where x, y ∈ {0,Z+}, which is surprisingly similar to the rule for component labeling. The
dimer automaton is initialized such that x0
i=|V|except for the source vertex, which is
initialized to 0. Often we may not want to compute the distance from a source vertex to all
other vertices, but for all pairs vertices (APSP). This is necessary if we wish to know the
diameter of the graph, for example. One approach is to break up the APSP problem into
|V|independent SSSP problems. This approach is discussed as a case study in §3.2.3.
Another classic graph problem is to compute a graph coloring. To color a graph, we assign
a color to each vertex such that no two endpoints of any edge in that graph have the same
color. A graph is k-colorable if k (or fewer) colors can be used to color the graph. A
simple randomized algorithm to color a graph picks an edge at random and examines the
colors of the two endpoints. If the colors are equal, then it changes one of the colors of the
endpoints. This is repeated until there are no edges whose endpoints have the same color.
This algorithm can be easily implemented as a symmetry breaking dimer automaton. The
rule is,
R(x, y)=(x+δ(x, y), y) mod k, (2.9)
where x, y Z. If xand yare equal, then only xis increased by 1, and yremains the same.
Otherwise both xand yremain the same. If the graph is not kcolorable, then the algorithm
will not converge. If the graph is kcolorable then the algorithm can converge, but may take
an very long time to do so. Experimental results show that the algorithm converges quickly
when kis greater or equal to the average degree of the network, in the case of the uniform
lattice.
A common graph problem is topological sorting, where a unique label is assigned to each
vertex such that, for every edge, the source vertex’s label is less than the destination vertex’s
label [61]. A randomized algorithm to solve this problem can be written as a dimer automaton
with symmetry breaking rules as follows,
R(x, y) = ((y, x)if y < x
(x, y)else .(2.10)
Dustin Lockhart Arendt Chapter 2. Applications 39
Of course, perfect topological sorting is only possible if the graph is directed and acyclic, and
if a fast exact answer is desired, then a deterministic serial algorithm should probably be
used. However, the steady state distribution of the dimer automaton version might provide
useful information in a social network analysis context. This may be a simple way of ranking
the each vertex or to measure core-periphery structure, for example [20].
Chapter 3
Implementations
Not surprisingly, modeling complex systems can become computationally intensive when
the desire for more physical realism or accuracy necessitates using larger and larger models.
Sometimes parallelization can be a useful tool in reducing long run times by a constant factor
to reasonable levels. The inherent locality of interactions and spatially extended nature
many models for complex systems make them good candidates for effective parallelization.
In the past, parallelization has been a tool most beneficial to those who had access to a
supercomputer or, more recently, a cluster of desktop computers. However, the low cost,
high performance, and ubiquitous presence of the GPU on the average computer today has
essentially delivered supercomputing to the masses. A decade ago, the power of the GPU was
only being tapped by experts in the graphics community. However, the recent development
of general purpose GPU (GPGPU) SDK’s such as CUDA and OpenCL has opened up GPU
computing to a much larger audience. Therefore, it is no surprise that we have recently seen
an explosion in the number of scientific computing applications implemented for the GPU
[87].
However, these SDK’s are no magic pill; developing efficient applications for the GPU is still
a difficult task requiring a significant investment in time. There are many GPU architecture
specific considerations that one must make in order to see any benefit from GPU paralleliza-
tion; fine-tuning code is a difficult, but rewarding process. For these reasons it is desirable
to maximize the reusability and applicability of any code developed for the GPU. Though
cellular automata are very popular, they are not very flexible; GPU implementations fix the
lattice and neighborhood in order to produce efficient, but ultimately ad-hoc code. Thus,
40
Dustin Lockhart Arendt Chapter 3. Implementations 41
in the context of using GPGPU programming for scientific computing, we find many such
ad-hoc solutions. Researchers waste time by re-learning the lessons of others while solving
only slightly different problems. If our desire is to develop an efficient GPU implementa-
tion of a spatially extended dynamical system, we ought to choose a very flexible modeling
framework from the start.
We have shown that dimer automata have a good balance of robustness, generality, and
simplicity, and are capable of modeling several types of interesting physical phenomena.
These qualities make parallelization of dimer automata a rewarding, but nontrivial task, as
two major challenges exist:
1. general graphs result in random memory access patterns, and
2. asynchronous updating creates serial dependencies.
Our contribution is two efficient algorithms for GPU-parallel simulations of dimer automata
that overcome these challenges based on different, but realistic assumptions. The next section
provides a brief overview of the relevant literature and GPGPU programming issues as they
relate to the development of our solutions. Then we present the first GPU algorithm, which
is designed to accelerate the simulation of many independent mid-sized dimer automata.
The second GPU algorithm is designed to accelerate the simulation of a single large dimer
automaton. We define “mid-sized” to mean dimer automata that are too large to fit in the
shared memory of the GPU, but many copies can fit in the GPU’s global memory. “Large”
dimer automata should take up nearly all of the GPU’s global memory.
3.1 Background & Related Work
GPGPU programming has become a popular tool for simulations of physical phenomena,
especially spatially extended phenomena [87]. The uniform lattice and parallel nature of
updates make cellular automata highly amenable to GPU parallelization [12]. Given the
popularity of cellular automata, it is not surprising that there are many examples of GPU
accelerated cellular automata in the literature. However, GPU accelerated computation on
general graphs is much more challenging, seeing only modest speedups for simple algorithms
such as breadth first search, shortest path, and component labeling [53, 55]. In these al-
gorithms, the graph is represented as an adjacency list that has been packed into a single
Dustin Lockhart Arendt Chapter 3. Implementations 42
contiguous array. To process a vertex, it is necessary to first read the adjacency information
for a vertex and then potentially read in the state information of that vertex’s neighbors.
This can be costly since it requires multiple random accesses to the GPU’s global memory,
which has high latency.
Asynchronous updating of dimer automata creates serial dependencies that can make par-
allelization difficult. One must ensure that no two updates are performed concurrently on
edges that share the same vertex. Otherwise this would create a conflict that potentially
affects the accuracy of the simulation. There has been recent research towards efficient par-
allelization of asynchronous cellular automata [62]. Notably, it was found that converting
asynchronous cellular automata to block synchronous automata, when possible, allows for an
efficient parallel implementation [13]. This is similar to the checkerboard update scheme for
the Ising model, which has an efficient GPU implementation [88]. On a square lattice with
von Neumann neighborhoods, each lattice cell can be labeled as black or white resembling
a checkerboard pattern. The algorithm alternates between updating all black cell and all
white cells, because cells of a given color are neighbored only by cells of the opposite color.
Thus, the neighbors of any cell will never change while that cell is being updated.
The checkerboard and block synchronization algorithms are reasonable approaches to par-
allelize systems that rely on one-at-a-time updates, but these approaches cannot be directly
applied to non-uniform lattices. Therefore, we believe our algorithm, which is designed to
simultaneously address the challenges of asynchronous updates and non-uniform lattices, is
a significant addition to the aforementioned research.
Modern GPGPU SDK’s (e.g., OpenCL and CUDA) abstract the architecture of the GPU
as a massively parallel single instruction multiple data (SIMD) machine. All GPGPU code
contains one or more “kernels” that perform the main work of the algorithm. Given a
problem of size n, at a conceptual level, one can imagine that nidentical copies of the kernel
code are run on nsub-problems simultaneously. The application of a kernel to a specific
sub-problem is called a work item in OpenCL. Ideally, one work item does not depend on
the result of another so tasks can be executed in an embarrassingly parallel manner. The
most important consideration in GPGPU programming is determining what task the work
items should perform, and thus, how to code the kernel(s) to accomplish this.
Optimizing the memory access patterns of the work items is another important consideration
in effective GPGPU programming. Just like the CPU, the GPU contains several different
Dustin Lockhart Arendt Chapter 3. Implementations 43
types of memory whose speeds and capacities depend on their physical distances from the
processing units. Unlike modern CPU’s, the GPU does not have as much sophisticated paging
and cacheing to reduce memory latency associated with random accesses. However, the GPU
can context switch between threads waiting for global memory to be retrieved, potentially
hiding some latency if enough threads are active. Memory latency can also be amortized
by reading or writing large, sequential chunks of data at a time. This is accomplished by
ensuring that memory access in the kernel is coalesced. For example if work item ireads
data global array[i] for all i∈ {1, ..., n}, the GPU will optimize this code by reading in
a large swath of memory at once instead of performing many small reads. The need to
coalesce memory access on the GPU was an important factor in the design of the kernels of
the parallel dimer automaton algorithms.
3.2 Many Mid-Sized Dimer Automata
Often it is necessary to perform many independent trials (e.g., Monte Carlo sampling) to
understand systems, especially stochastic complex systems. These independent trials in can
be characterized by the following cases:
1. varying the random seed to observe a distribution of behaviors;
2. varying the initial condition to observe how it affects the outcome;
3. varying the model or its parameters to observe a variety of different behaviors; or
4. some reasonable combination of the above cases.
An example of the first case is in well mixed systems of reacting chemicals in low concentra-
tion, which is modeled exactly by Gillespie’s Algorithm [49]. In these systems sometimes the
same initial condition results in two drastically different outcomes. A famous example of this
is the “developmental bifurcation pathway in phage lambda-infected e. coli cells” [9]. In this
case just a handful of chemical species with low population randomly determines whether
the phage enters lysis or lysogeny. Lysis and lysogeny are two global attractors that are both
reachable from the same initial condition. The second case, varying the initial condition,
can provide useful information about a systems dependence on its initial conditions (i.e., is
the system chaotic?).
Dustin Lockhart Arendt Chapter 3. Implementations 44
Wolfram’s classical search of elementary cellular automata is a famous example of the third
case [110]. He ran a simulation for each of the 256 different rules starting from simple initial
conditions and observed four classes of behaviors. Automata belonging to two of these
classes exhibited surprisingly complex behaviors despite their simple formulation. Since this
original work, it has been popular to perform various similar types of searches for other
complex systems [69], often aided by genetic algorithms [30, 78, 34, 59, 111, 104]. In all of
these cases, the simulation of many independent complex systems is crucial.
Clearly, any technology that improves the efficiency of one or more of the above cases is of
general interest. Therefore, we present an algorithm for the GPU-acceleration of many con-
current dimer automata simulations that handles any combination of cases 2 and 3 mentioned
above. Dimer automata are discrete (in state, space, and time), stochastic, asynchronous
dynamical systems that can be used to model a number of useful phenomena [95, 5, 7].
One of the useful properties of dimer automata is that they operate on arbitrary graphs,
updating both endpoints of a single randomly chosen edge simultaneously. So, in addition to
providing a nice modeling and simulation framework for complex systems, dimer automata
can be used to design randomized, decentralized, and robust algorithms for computing on
graphs. We benchmark and discuss the algorithm we present, and we also consider what
circumstances this GPU acceleration of dimer automata can be effectively used for general
graph computations.
Many examples of GPU accelerated simulations of complex systems can be found in the
literature. Many of these models are designed to speed up the execution of a single large
model. However, when the size of the model decreases, the speedup diminishes. There are
only a few cases where researchers have focused on the GPU acceleration of a large number of
independent simulations. One approach that deserves mention due to its similarity to dimer
automata is the GPU acceleration of Gillespie’s Algorithm for stochastic chemical kinetics
[71]. In this case, each thread on the GPU performs Gillespie’s algorithm independently
by firing randomly chosen reactions (firing a reaction is similar to an edge update in dimer
automata). This algorithm is tailored towards very small simulations (e.g., one case study
had just three species and four reaction channels). It works best assuming the number of
chemical species, channels, etc. is low enough that all information for all threads in a thread
block fits into the limited shared memory space. This assumption reduces the cost of the
random memory access inherent to simulations of stochastic chemical kinetics.
Dustin Lockhart Arendt Chapter 3. Implementations 45
Instead, our algorithm is designed to operate on many mid-sized simulations. We define
mid-sized as having too many vertices to fit in shared memory, but few enough that (at
least) several copies fit in global memory. The main challenge of coalescing memory access
is resolved by assuming each simulation sees the same order of edge updates. However, this
prevents our GPU parallelization from being used in case 1, where only the random seed
is varied. Instead, one must vary the initial input or the model itself across simulations to
benefit from our GPU parallelization. We discuss the implementation details of our algorithm
next.
3.2.1 Methods
Our goal is to develop an efficient GPU parallelization of many independent, mid-sized
dimer automata. An embarrassingly parallel approach would work by running many dimer
automata across as many processors and collecting the result. Since GPU’s are adept at such
embarrassingly parallel tasks, we might try to distribute a large number of dimer automata
on the GPU in a similar manner. However, one would not see much speedup because of
the random memory access patterns resulting from each dimer automaton having its own
unique and random order of updates. To resolve this, we assume that every dimer automaton
undergoes the same order of edge updates. This allows memory access to be coalesced, which
greatly improves the efficiency of the GPU parallelization. Of course, this assumption comes
with a cost; it cannot be applied to case 1 from §3.2. Under the same rule, initial condition,
and (from the assumption) the same order of updates, a dimer automaton will always produce
the same outcome. This would defeat the purpose of repeated trials as any single trial would
be sufficient. Fortunately, the assumption is still reasonable for the remaining cases, so we
develop a simple yet efficient GPU parallelization building on this idea.
We assume a direct mapping between threads and dimer automata simulations on the GPU.
Since we are also assuming that each dimer automaton will update the same edge at the same
time, we can arrange the memory layout of all experiments’ states so that memory access
is coalesced. This is accomplished by arranging each xufor each automaton contiguously
in memory. This is the transpose of the more common approach, where the states of adja-
cent vertices are arranged contiguously. To make the GPU algorithm as robust and widely
applicable as possible, the entirety of E(the connectivity information) is not stored on the
GPU. Instead, a small number of edges are sampled from Eby the CPU and written to the
Dustin Lockhart Arendt Chapter 3. Implementations 46
GPU’s local memory in small chunks. These edges are then updated sequentially for each
experiment during a single kernel invocation. There are several advantages to this technique.
Not storing Eon the GPU frees up memory that can be used for additional experiments,
which increases throughput. Furthermore, if the graph is very dense, there may not even be
memory on the GPU, since |E|=O(|V|2). This approach is also compatible with sampling
edges from a non-uniform distribution, despite the fact that this can require complex data
structures [99]. The complex sampling algorithm can be kept on the CPU since the edges
sampled are simply copied to the GPU afterwards.
However, interleaving kernel execution with edge sampling results in the CPU and GPU
being idle while the other works, negatively impacting performance. Fortunately, a simple
solution is found by dividing the computation into blocks. The GPU portion of a block is
dependent on the edges sampled by the CPU in the previous block. The CPU portion of the
current block computes and writes the edges needed by the next GPU block. This reduces
the amount of time the CPU and GPU are idle, which improves performance. An illustration
of this host/device scheduling optimization is shown in Figure 3.1, and its pseudocode and
OpenCL kernel code is outlined in Algorithm 5. We note that, at best, this approach will
speed up the GPU parallelization by a factor of 2. This occurs when the execution times of
the GPU and CPU tasks are equal.
Algorithm 5: pseudocode to call the kernel in a blocked fashion to allow concurrent
CPU/GPU execution
Input: n (number of edges to update); N (edges per block); block size
steps = dn/block size/Ne;1
sample edges for block 0;2
copy block 0 to GPU;3
sample edges for block 1;4
for i=0..steps do5
copy block i+ 1 to GPU (asynchronously);6
wait for block ito finish copying;7
for j=0..block size do8
call DimerAutomatonKernel on block iwith offset j·N;9
end10
sample edges for block i+ 2;11
finish command queue;12
end13
Dustin Lockhart Arendt Chapter 3. Implementations 47
Function DimerAutomatonKernel(uint n, uint offset, global uint* Eu, global
uint* Ev, global T* R, global T* X, local uint* u cache, local uint*
v cache, global int* counts)
uint gsi = get global size(0);1
uint gid = get global id(0);2
int count = 0;3
// copy the edges to local memory
event t events[2];4
events[0] = async work group copy(u cache, &Eu[offset], (size t)n, 0);5
events[1] = async work group copy(v cache, &Ev[offset], (size t)n, 0);6
wait group events(2, events);7
// begin updating edges
for uint i=0;i<n; i++ do8
// read the edge
uv
uint u idx = mad24(gsi, u cache[i], gid);9
uint v idx = mad24(gsi, v cache[i], gid);10
// read the state of
uv
T xu = X[u idx];11
T xv = X[v idx];12
// compute the next state
T xup = apply rule(xu,xv,R);13
T xvp = apply rule(xv,xu,R);14
// write the new state of
uv
X[u idx] = xup;15
X[v idx] = xvp;16
// count number of changed edges
count += (isnotequal(xu,xup) |isnotequal(xv,xvp));17
end18
counts[gid] += count; // optional19
Dustin Lockhart Arendt Chapter 3. Implementations 48
CPU
GPU
sample write
k1 k2 k3 k4
sample write
k5 k6 k7 k8
sample write
k1
sw
CPU
GPU
sw sw sw sw sw
k2 k3 k4 k5 k6
interleaved
blocked
Figure 3.1: Arranging kernel calls into blocks as shown allows helps the GPU to avoid
becoming idle.
3.2.2 Results
To evaluate the effectiveness of our GPU parallelization, we consider two different cases
corresponding to two common dimer automaton rule representations:
1. eqn, an algebraic representation of the rule where each simulation uses the same equa-
tion with a different initial configuration; and
2. mat, a lookup table representation of the rule where each simulation uses a different
rule and, potentially, a different initial configuration.
The matrix representation results in random memory access patterns in the kernel. This is
not the case with the algebraic representation, which simply requires computing an expression
(whose constants can be accessed in a coalesced manner). Thus we would expect the algebraic
representation to have better performance.
Dustin Lockhart Arendt Chapter 3. Implementations 49
We carefully compare the performance of the GPU1algorithm to the CPU2algorithm in
the following manner. The size of the graph |V|and the number of threads τare varied
simultaneously so that τ|V|=M, where Mis a constant close to the memory capacity of the
GPU. For simplicity, the graph’s edges are chosen randomly. The CPU algorithm essentially
executes the same loop as DimerAutomatonKernel, but simulates each dimer automaton in
its entirety before moving on to the next. Figure 3.2 shows the comparison of the four cases
(GPU/CPU and eqn/mat). For the algebraic case, the shortest path equation was used, and
for the matrix case, each automaton uses a different random matrix for its rule. The GPU
algorithm is clearly faster in both cases, and a summary of the results is shown in Table 3.1.
The algebraic GPU case is the fastest, as expected, being 37 times faster than the CPU
equivalent and having a maximum throughput of 313 million edge updates per second.
Figure 3.2: Comparison of throughput for algebraic and finite state machine representations
of dimer automata rules for GPU and CPU implementations.
1using a NVIDIA GeForce 9400M with 256MB RAM and 32 cores
2using an Intel 2.8GHz Core 2 Duo with 6MB L2 cache
Dustin Lockhart Arendt Chapter 3. Implementations 50
Table 3.1: Optimal effectiveness of matrix/equation GPU acceleration
threads vertices throughput speedup max. throughput
(#) (#) (updates/sec) (updates/sec)
mat 1024 32768 1.45 ×10823.84 1.57 ×108
eqn 2048 16384 2.74 ×10837.10 3.13 ×108
3.2.3 Discussion
Several notable effects are seen in Figure 3.2 from the original experiment. As the number of
threads τincreases, the throughput of the matrix GPU algorithm increases quickly, but then
begins to decrease after 2048 threads. This is most likely due to the increasing number of
un-coalesced memory accesses. When τis small, the GPU algorithm is less efficient because
there is not enough inherent concurrency for the GPU to take advantage of. When τis large,
the number of random memory accesses is large, resulting in more latency. This produces
the optimal point between these extremes that we observe. The decrease towards the tail
of the algebraic case is less pronounced, and is likely due to the slight overhead incurred in
scheduling such a large number of threads.
A final observation worth noting here is that while the GPU algorithm’s performance tends
to decrease after a certain number of threads, the CPU algorithm has the opposite behavior.
In our experiment, the size of the graph is inversely proportional to the number of threads so
that the memory footprint remains constant. When a large number of threads are used, the
graph becomes very small. There appears to be a certain threshold, (around 8192 threads
and 4096 vertices–roughly 16KB) where the CPU algorithm suddenly becomes more efficient.
This is most likely due the working set of the CPU algorithm becoming small enough for
memory caching and paging to become optimal. For this reason, the biggest GPU speedup
(in both cases) does not occur with the highest throughput (see Table 3.1).
Block and Rule Sizes
Here we examine how other aspects of our GPU algorithm affect its performance. An impor-
tant aspect of the GPU algorithm is the consolidation of kernel invocations into blocks. This
helps prevent the CPU and GPU from remaining idle while the other is working. Figure 3.3
shows how the block size affects the throughput of the algorithm (the number of threads is
varied in the same manner as the previous experiment). The results show that with enough
Dustin Lockhart Arendt Chapter 3. Implementations 51
concurrency, any choice of block size reaches peak throughput. However, it appears that
increasing the number of kernels called per block causes this threshold to be reached more
quickly.
We also consider the effect of adding additional states to Σ which increases the rule size.
Previously we showed that increasing the concurrency can decrease throughput after a certain
point by increasing the total number of un-coalesced memory reads. It is also the case that
the total memory footprint of the rule affects the GPU performance. This is evidenced by
Figure 3.4 where the original experiment is repeated for different sized rules. Even while
keeping the number of threads constant, the throughput is negatively affected as the rule
size |Σ|increases from 3 to 8.
Figure 3.3: Increasing the number of kernel invocations per block causes the GPU algorithm
to reach maximum throughput sooner (with fewer threads).
Dustin Lockhart Arendt Chapter 3. Implementations 52
Figure 3.4: Increasing the number of states (the rule size) also has a detrimental impact on
performance.
Dustin Lockhart Arendt Chapter 3. Implementations 53
Throughput vs Efficiency
Dimer automata can be used to implement algorithms for computations on graphs. However,
their efficiency may be dependent on the topology of the graph; in other words, throughput
is not equivalent to algorithmic efficiency. Dimer automata compute on graphs via front
propagation, much like a breadth first search, to find the result. Ideally the front will cover
a significant portion of the vertices at any given time and will move across the entire graph
quickly. This is the case for high dimensional graphs, especially small world graphs. However,
for low dimensional graphs, the diameter can be large, causing the algorithm to converge
more slowly. Small world graphs are the best case scenario in terms of the efficiency of using
simple dimer automata for graph computation. The Watts-Strogatz small world model
provides a simple way to interpolate between fully random and uniform graphs based only
on the rewiring probability p[106].
We use this classic approach to quantify how the efficiency of our shortest path rule de-
pends on the structure of the graph. We measure efficiency by considering the fraction of
edges updated that actually changed the state of the dimer automaton before the algorithm
converges. In other words, efficiency is the amount of useful work the dimer automaton is
actually performing. We use the shortest path rule R(x, y) = min(x, y + 1) with the initial
configurations designed so that each simulation solves the shortest path problem for a dif-
ferent source vertex. Edge updates are broken up into sets where each edge is updated at
least once (any order can be used). If no updates changed the system after updating a set
(i.e. every edge), then the algorithm has converged. Figure 3.5 shows how the efficiency is
affected by the structure of the graph.3As the graph transitions from ordered to random,
the efficiency improves predictably, with maximum efficiency close to 20%. This is one rea-
son why the GPU algorithm’s performance was not better than a more sophisticated serial
algorithm4for the same problem.
3.2.4 Conclusions
Many GPU parallelizations of cellular automata are designed to take advantage of the regular
structure of the underlying lattice. Larger lattices provide more concurrency, allowing the
3the graph had 16,384 vertices and 2,048 threads because this configuration produced the highest speedup
in the original experiment.
4comparison based on the igraph diameter function [29]
Dustin Lockhart Arendt Chapter 3. Implementations 54
Figure 3.5: As the structure of the graph transitions from uniform to random, efficiency
improves.
Dustin Lockhart Arendt Chapter 3. Implementations 55
GPU parallelization to have better performance. So, these parallelizations are best suited
for simulating a single large system. However, there are clear cases where simulating many
(not necessarily large) systems is warranted. Furthermore, there are many cases where non-
uniform topologies and/or asynchronous updating is also warranted. Each of these issues
presents unique challenges. Therefore, we have designed a GPU parallelization of the dimer
automaton framework tailored towards concurrent simulation of mid-sized simulations. The
parallelization is made effective by assuming each automaton sees the same order of updates,
but this reduces the applicability of our approach somewhat.
Fortunately, there are many cases where this assumption is acceptable, and the GPU par-
allelization results in significant performance gains. In the best case, our algorithm has a
throughput of over 313 million edge updates per second and produces a speedup of 37 over
the CPU implementation. Additionally we note that the hardware used to evaluate our algo-
rithm was fairly modest by todays ever-improving standards, having only 32 cores. High-end
models can increase this number by an order of magnitude, which, ideally, would also result
in an order of magnitude improvement in performance.
Because of the ease in which many graph algorithms can be expressed as simple dimer
automaton rules, we also tested our algorithm’s usefulness in this respect. However, we
found that the applicability can be diminished when the graph has low dimension and the
algorithm (i.e., the dimer automaton rule) relies on a breadth first search. Additionally, the
GPU algorithm’s performance suffered when each dimer automaton used a different rule,
represented as a matrix. Future work to convert matrix representations to algebraic ones
would have a significant positive impact on performance. Finally we note that our approach
can be extended to asynchronous variants of complex systems (e.g., asynchronous cellular
automata, random boolean networks, etc.) with only trivial modification to the kernel and
algorithm.
3.3 One Large Dimer Automaton
We parallelize a single large dimer automaton by converting it from asynchronous to syn-
chronous automata akin to the block synchronous cellular automata approach [13]. In our
approach, the parallel step simultaneously updates all edges belonging to a randomly chosen
maximal matching of G. A matching is a set of non-overlapping edges in a graph, and a
Dustin Lockhart Arendt Chapter 3. Implementations 56
matching is maximal if it cannot be further increased by adding edges. Each edge in a match-
ing can be updated by the dimer automaton in any order correctly since no two edges share
the same vertex. The matching for any synchronous update step is chosen randomly from a
small set of pre-computed matchings we call M. Our hypothesis is that this approximation is
valid when the graph has low dimension (e.g., when representing 2-D space). Subsequently,
we show how to generate this matching set, and how to efficiently update edges in parallel
on the GPU. Finally, we validate our approach by comparing the performance of the GPU
algorithm to optimized ad-hoc serial algorithms.
3.3.1 GPU Implementation
It is important that any given matching is large to ensure the GPU parallelization is efficient
(for reasons discussed later). Furthermore, each edge should be present in roughly the same
number of matchings, otherwise this incorrectly biases the probability of that edge being
updated. Thus, both the performance and accuracy of the parallelization are dependent
on the matching set. For this reason we have developed an algorithm that, given a graph,
carefully generates a set of large and balanced maximal matchings, shown in Algorithm 7.
Maximal matchings are found by testing if a given edge can be added to the matching. If it
can, then that edge’s endpoints are marked as covered and that edge is added to the current
matching. The order in which edges are tested is determined by the number of matchings
that edge already belongs to. In the inner loop, the algorithm tests the remaining edge that
belongs to the fewest number of matchings, making it a greedy algorithm. The algorithm is
O(n· |E|log |E|), where nis the number of desired matchings. This cost can be amortized
over a large number of experiments, as Mneeds to only be computed once per graph.
The advantages and limitations of the GPU architecture provide clear guidelines for imple-
menting the GPU parallel dimer automaton algorithm. As the sole mechanism for iterating
the system is by updating edges, the obvious choice is that one work item should be respon-
sible for updating one edge. Recall that a maximal matching on the graph is the largest set
of edges that can be updated simultaneously, and we can balance the updating of all edges
by sampling from a set of different matchings. It follows that every vertex has at most one
neighbor in a matching, because edges in matchings do not overlap. This naturally allows
a matrix to be used to represent an arbitrary set of matchings. Thus, let the matrix Mbe
an n× |V|matrix (where nis the number of matchings) such that mij =kif the edge
jk
Dustin Lockhart Arendt Chapter 3. Implementations 57
Algorithm 7: A greedy algorithm for computing a set of balanced maximal matchings.
Input: A graph G(V, E); the number of matchings n
Output: The matching set matchings
matchings := |E| × nmatrix set to F ALS E;1
freq := array of length |E|set to 0;2
for i=1..n do // for each matching3
cover := array of length |V|set to F ALSE;4
order := (1,2, ..., |E|);5
sort order ascending by freq;6
for j= 1..|E|do // for each edge7
k:= order(j);8
uv := E(k); //
uv is the kth edge in E9
if F ALSE =cover(u)F ALS E =cover(v)then10
cover(u) := cover(v) := T RU E ;11
matchings(i, k) := T RU E;12
freq(k) := f req(k) + 1;13
end14
end15
end16
belongs to matching i, otherwise mij =j. This matrix can be computed directly from the
matching set and E. An example of a matching on a small network and the corresponding
row of Mare shown in Figure 3.6. It should be noted that the kernel can accommodate
fully separable rules as is. When the rule is fully separable, one may change Mso that it is
no longer encodes matchings, but so that mij is equal to a randomly chosen neighbor of j.
This allows the GPU algorithm to become more efficient as more than double the number
of edges can be updated per parallel iteration.
The kernel for the GPU-parallel dimer automaton algorithm is described below in Algo-
rithm 9. The inputs are N, a row of the matching matrix M, and Xin, the current state of
the system. The output is written to Xout, a separate buffer. It is more efficient to compute
the next state of Xone vertex at a time, even though the result is known for both endpoints
of that edge. This allows for reading and writing to and from global memory to be coa-
lesced. However, because a kernel’s execution is not completely parallel in the architecture,
the result of an edge update cannot be written back to the input, Xin. Instead, it is copied
to an output buffer, Xout. In the next kernel call, the output buffer of the previous call is
passed as input, and the input is passed as output. Each row of the matching matrix Mis
Dustin Lockhart Arendt Chapter 3. Implementations 58
5
6
1
23
7
11 12
8
9
13 14 15 16
10
4
1
5
9
12
2
3
10
16
3
2
11
8
4
4
12
9
5
1
13
14
6
7
14
13
7
6
15
15
8
11
16
10
Figure 3.6: Row iof the matching matrix Mon a small network corresponding to the
matching on the network shown.
stored in separate buffers on the GPU so that a random row can be chosen by the CPU and
passed to the kernel during the kernel call, as shown in Algorithm 9.
Function DimerAutomatonKernel( global uint* N, global T* Xin, global T*
Xout, global T* R)
uint u = get global id(0);1
uint v = N[u]; // coalesced read2
T Xu = Xin[u]; // coalesced read3
T Xv = Xin[v]; // un-coalesced read4
T Xup = R[Xu,Xv];5
Xout[u] = isequal(u,v)*Xu+isnotequal(u,v)*Xup; // coalesced write6
The main bottleneck in the kernel is the un-coalesced read from global memory on line 4,
which accesses the state of a neighboring vertex. Unlike square lattices, we cannot guarantee
a predictable memory access pattern for arbitrary graphs. However, we can permute the
vertices of the graph to reduce the average in-memory distance between neighboring vertices.
Because our vertices are embedded in a two dimensional grid, we simply need a function of
the form f:Z27→ Zwith the property that
|(x, y)(z, w)| ∼ |f(x, y)f(z, w)|,(3.1)
Dustin Lockhart Arendt Chapter 3. Implementations 59
Algorithm 9: Repeatedly invokes the kernel to advance the dimer automaton for the
given rule, matching matrix, and initial input.
Input: M, Xin
Output: Xout
write Xin to GPU;1
write each row of M to GPU;2
while ... do3
DimerAutomatonKernel(M[rand()%n],Xin,Xout);4
DimerAutomatonKernel(M[rand()%n],Xout,Xin);5
end6
read Xin from GPU into Xout on CPU;7
where |·|denotes the measure of euclidean distance. The Hilbert curve [22] is precisely such
a function, and is commonly used in similar situations. The points produced by random
sphere packing are traversed in Hilbert order to determine the actual ordering of the vertices
in the lattice. The net result of this effort is to decrease the likelihood that random memory
accesses result in a cache miss, improving performance.
3.3.2 Performance Benchmarks
To validate our parallel algorithm, we consider dimer automaton models for flocculation,
grain growth, excitable media, and Schelling segregation. Examples of each phenomena are
found in the Applications chapter. The algorithm we presented parallelizes dimer automata
for arbitrary graphs and rules. However, this approach is not equally suitable for all situa-
tions, as ad-hoc implementations can make model-specific optimizations to greatly improve
performance. Thus, for the purpose of meaningful comparison, we evaluate our GPU algo-
rithm operating on the four models discussed in the introduction. Each model presents a
unique matchup between the GPU algorithm and and the serial algorithm.
Several of the ad-hoc serial algorithms for these phenomena become more efficient as the
simulation progresses, whereas, the efficiency of the GPU algorithm is constant, regardless
of the rule or state of the system. Thus, we compare the GPU algorithm5to the various
serial algorithms6over time. Table 3.2 summarizes the benchmarking of the GPU algorithm
against various serial algorithms for the four phenomena discussed in this paper. For these
5measured using a NVIDIA GeForce 9600M GT graphics card with 512MB memory
6measured using a 2.8GHz Intel Core 2 Duo processor with 6MB L2 cache and 4GB memory
Dustin Lockhart Arendt Chapter 3. Implementations 60
Figure 3.7: Rendering of the Hilbert curve on a 32 ×32 grid. Line segments are color coded
according to their position in the curve.
Dustin Lockhart Arendt Chapter 3. Implementations 61
comparisons we used a lattice with 1,375,783 vertices and 4,127,301 edges. The GPU algo-
rithm is most effective in simulating excitable media, with a speedup that is over 80 times
faster than the serial implementation. For the flocculation and grain growth serial algo-
rithms, we adjust the number of edge updates performed in order to account for “inactive”
edges that are ignored.
Table 3.2: GPU-CPU comparisons and observed throughput and speedup.
phenomenon CPU GPU GPU throughput speedup
(updates/sec) (times faster)
flocculation particles matching 1.63 ×1088.88
grain growth active edges matching 1.61 ×10829.8
excitable media sample u.a.r. separable 2.84 ×10880.1
Schelling segregation sample u.a.r. matching 1.51 ×10840.8
Figure 3.8 shows the speedup measured over time for each comparison. Here it is immediately
apparent that the serial algorithm used by the grain growth simulation performs very poorly
at first. However, as the domains increase in size, the advantage of the serial optimization
increases. Approximately halfway through the simulation, the serial algorithm for grain
growth becomes more efficient than the serial algorithm for Schelling segregation. However,
it would still take a very long time for this algorithm to outperform the GPU algorithm due
to the manner in which the grain growth phenomena scales over time. The speedup for the
flocculation simulation was the least impressive; the serial algorithm is very efficient, with a
minimal memory footprint.
We measured the effectiveness of the Hilbert ordering for the GPU algorithm by comparing
the runtime using graphs with their vertices ordered using the Hilbert curve to graphs with
their vertices in random order. Figure 3.9 shows that for large graphs the Hilbert ordering
more than doubles the performance. To create the ordering, vertices in the lattice are
projected into a grid and then traversed in Hilbert order. The default grid used had cell
widths just small enough that at most one vertex could fit into each cell. We also measured
the effectiveness of using other sizes, but found that they performed no better than using
the default for large graphs.
Dustin Lockhart Arendt Chapter 3. Implementations 62
Figure 3.8: Speedup attained by the GPU implementation. For the excitable media model
the GPU algorithm performs 80x faster than the serial algorithm.
Dustin Lockhart Arendt Chapter 3. Implementations 63
Figure 3.9: The Hilbert ordering contributes to a speedup of a factor greater than 2x.
Dustin Lockhart Arendt Chapter 3. Implementations 64
3.3.3 Discussion
The observed performance of our GPU implementation is very promising, as the algorithm
is a significant improvement over most of our ad-hoc serial algorithms. Determining the
state of the neighboring vertex requires an un-coalesced memory access, which may slow
down the kernel. However, given the observed performance, we can conclude that the GPU
does an excellent behind-the-scenes job hiding the memory latency of this access. Our
speedup is greater than that reported by Bandman [13]. We believe the high throughput
(edge updates/second) produced by our algorithm is largely due to minimizing un-coalesced
memory access patterns. Our algorithm only performs |V|un-coalesced reads per kernel
call. If the rule needed the entire neighborhood of each vertex per kernel call (as is the case
with cellular automata and boolean networks, for example), this would require a total of
2· |E|un-coalesced reads. For a tree which has |V| − 1 edges, this is still twice as many
un-coalesced reads; for a typical square lattice, this is eight times as many. Thus, one could
argue it is the simplicity of dimer automata that is the basis for the effectiveness of our GPU
algorithm.
However, our efficient GPU implementation of dimer automata depends on several assump-
tions about the nature of the graph to be used:
1. it is possible to generate a small set of matchings such that each matching covers most
of the vertices in the graph;
2. sampling from this set of matchings and updating these edges synchronously is approx-
imately equivalent to updating edges asynchronously; and
3. edges have constant and uniform propensity (i.e., no edge is more or less likely to be
chosen than another).
Subsequently we discuss the implications of these assumptions as they relate to the correct-
ness, efficiency, accuracy, and general applicability of the techniques we have presented.
Correctness & Efficiency
Though the dimer automaton requires asynchronous updating, any two edges can be up-
dated simultaneously as long as those edges do not share any vertices. Since, our algorithm
Dustin Lockhart Arendt Chapter 3. Implementations 65
only updates the edges in a matching concurrently, our algorithm meets this requirement.
Furthermore, for correctness, the union of all the matchings must be equal to the edge set of
the graph (i.e., each edge must be present in at least one matching). Otherwise an edge that
is left out will never be updated during the simulation. The number of matchings needed
to guarantee that each edge is present in at least one matching is lower bounded by the
maximum degree of the graph. Other properties like density and degree distribution will
also contribute to the number of matchings needed. Scale-free networks, for example, tend
to have a several high degree vertices because the degree distribution follows a power law. In
this case, many matchings would be required to cover each edge, potentially exceeding the
memory capacity of the GPU. This is one reason for our assumption of a low dimensional
graph–the number of matchings required is more manageable.
If MEM is the memory capacity of the GPU (in 32-bit words), and kis the number of
matchings we wish to use, then
k < M EM 2· |V|
|V|.(3.2)
This is the case because two copies of the state vector are stored on the GPU in addition to
krows of the matching matrix, which have a length of |V|in each case. The largest graph
used in our experiments had 1,375,783 vertices and a maximum degree of 23, but 99.8% of
vertices had a degree of 8 or less. Thus, kcould not exceed 91, but this was far greater than
the minimum of 8 necessary to cover each edge. For practical purposes we let k= 64 to
facilitate simple and efficient storage of the matchings on disk.
For the GPU algorithm to perform efficiently, the average fraction of the vertex set that
can be covered by a maximal matching should be close to 1. Vertices that are not covered
by a matching result in unnecessary work by executing a read and write without applying
the rule. We found that the fraction of covered vertices was over 0.9 for the graph we used
in the experiment. This allowed the GPU algorithm to perform efficiently, and we would
expect this to be the case for other spatially extended graphs as well. However, it is easy
to imagine graphs where this fraction is low, such as a graph where one vertex is linked to
every other vertex. Any matching on this graph would only be able to cover two vertices in
total. While this may seem like a contrived example, graphs that model real-world data can
have similar hub-like properties. We leave as future work investigating whether the GPU
algorithm presented is also applicable to small-world, scale free, and other types of realistic
Dustin Lockhart Arendt Chapter 3. Implementations 66
graphs.
Accuracy
An asynchronous cellular automaton on a square lattice can be effectively approximated
by a block synchronous cellular automaton [13]. From our results, it appears that this
insight extends to parallelizing dimer automata as well. However, it is still important to
measure the accuracy of this approximation. Assuming uniform propensity, the probability
that an arbitrary edge has been updated ktimes after nreaction has a probability mass
function (PMF) that follows the binomial distribution, B(n, 1/|E|). Figure 3.10 compares
the PMF resulting from Algorithm 7 with the PMF that results from using independent
random maximal matchings. Algorithm 7 is much closer to the binomial distribution. We
examined the error incurred when using exactly kmatchings for k={1,2, .., 64}. The error
was measured as the sum squared difference between the observed PMF and the theoretical
(i.e., binomial) PMF. We found that the error follows a power law as a function of k(i.e.,
α·kβ) with β≈ −1.5, as seen in Figure 3.11. We would expect the number of matchings
needed for an accurate approximation to be dependent on the properties of the graph being
used. Therefore, we are not assuming that these results hold for graphs that are significantly
dissimilar from low dimensional lattices.
If we consider the following, simple dimer automaton we can understand the effect of the
GPU approximation differently, on an analytical level. Assume a dimer automaton with
a one dimensional graph G= (V, E ) such that V={1,2, ...},E={(i, i + 1) |iV},
Σ = {0,1}, and initial condition Xt=0 = (1,0,0, ..., 0) with the rule,
R(x, y) = (1if y= 1
xelse .(3.3)
Note that |E|=|V| − 1. This system will result in a simple form of front propagation in
one dimension and one direction. By comparing the statistics of this phenomena for the
serial and GPU algorithms we can analytically quantify the differences between the two.
The change in the number of 0’s or 1’s in Xover time follows a binomial distribution in
each case, but with different parameters. A binomial distribution B(n, p) has expectation of
E[B] = np and variance of V ar[B] = np(1 p).
First we can consider P, the number of states added to the front after kiterations by the
Dustin Lockhart Arendt Chapter 3. Implementations 67
Figure 3.10: Sampling from sets of maximal matchings compared to the true binomial prob-
ability mass function. Algorithm 7 (i.e. with sorting) is much closer to the true distribution
than sampling from independently generated maximal matchings.
Dustin Lockhart Arendt Chapter 3. Implementations 68
Figure 3.11: The error of sampling from a finite set of matchings is measured in relation to
the number of matchings used. Error is determined as the sum squared difference between
the binomial distribution and the distribution resulting from sampling from a finite set of
matchings.
Dustin Lockhart Arendt Chapter 3. Implementations 69
parallel algorithm. Suppose the GPU algorithm uses exactly two matchings m1, m2such
that m1m2=E. The front will only grow when the matching is chosen that contains the
edge at the front boundary. By the previous assumption, this edge must be in exactly one
of the two matchings. Since the matchings are chosen u.a.r., the probability of growing the
front by one unit during a parallel update step is 1/2. Thus, P=B(k, 1/2), so
E[P] = k/2,
V ar[P] = k/4.(3.4)
Now we will consider S, the PMF corresponding to the serial algorithm. The probability
of picking an edge that grows the front is 1/|E|, as there is always exactly one edge that
may do so. In a single update step, the parallel algorithm updates |E|/2 edges on average.
Therefore, if the parallel algorithm has performed ksteps, this corresponds to updating
k|E|/2 edges in the serial algorithm. Thus, S=B(k|E|/2,1/|E|), so
E[S] = k/2,
V ar[S] = k/2·(1 1/|E|)
=k
2·|V| − 2
|V| − 1.
(3.5)
As the size of the lattice becomes infinitely large, the variance is
lim
|V|→∞ V ar[S] = k/2.(3.6)
Therefore, we can conclude that, in the limiting case, the parallel and serial algorithms have
the same expected values, but the serial algorithm has double the variance of the parallel
algorithm. Repeating this analysis on higher dimensional lattices is outside the scope of this
paper, but, it is reasonable to expect a similar result. Therefore we believe, qualitatively
speaking, the GPU algorithm produces slightly smoother fronts than in the serial algorithm.
Knowing this, it would be wise to test one’s model on a smaller scale using the serial algorithm
to help identify any artifacts that may result from the GPU algorithm.
Dustin Lockhart Arendt Chapter 3. Implementations 70
General Applicability
A less realistic assumption our GPU algorithm makes is that all edges have equal propensity,
causing edges to be updated uniformly at random. One can easily imagine situations where
the propensity at each edge is not uniform. This may occur in fine-tuning the rule, or in
applying the model to non-uniform space. Therefore, here we discuss an extension of the
original GPU algorithm that can be used to address these scenarios. The approach is to
store the states of many independent pseudo-random number generators (PRNG’s) on the
GPU and to either accept or reject an edge update with probability
P r[accept update
uv] = P(u, v)
max(x,y)EP(x, y),(3.7)
where P(u, v) is the propensity of the edge
uv. The main challenge of this approach is in
maintaining the correctness of the algorithm while performing the rejection sampling. Recall
that in the GPU kernel the rule is applied to each endpoint of the edge separately in order
to maintain coalesced reads and writes to global memory. We must be very careful, as it
would be possible to update one endpoint but not the other, leading to an incorrect result.
If the rule is partially separable, then rejection sampling can be used on those transitions
that are separable without affecting the correctness of the simulation. This technique can
be applied to each of the models discussed in this paper under the right circumstances. For
example, the aggregation of particles to the cluster in the DLA model, the absorption of
states into a domain in the domain coarsening and grain growth model, and all transitions
in the excitable media model are all separable. Performing rejection sampling on a non-
separable transition is more cumbersome. We must store the state of one PRNG in the
GPU’s global memory for each edge in the matching set, which doubles the GPU memory
requirement. During the initialization of the PRNG seeds, we must ensure that the state
of the PRNG for adjacent vertices in each matching is equal so that updates are identically
accepted or rejected for both endpoints of an edge.
3.3.4 Conclusions
Dimer automata are a simple, powerful, and extensible way to model complex systems.
However, parallelization of dimer automata on the GPU is a nontrivial task for two reasons.
Dustin Lockhart Arendt Chapter 3. Implementations 71
First, updates are asynchronous; edges are picked randomly in an intrinsically serial process.
Second, the use of general graphs causes memory access patterns to be random and difficult
to optimize for the GPU architecture. We have presented a GPU parallel algorithm for dimer
automata that effectively addresses these two issues.
The problem of asynchronous updates is addressed by approximating the asynchronous au-
tomaton as a synchronous one. At any given time step, edges from a randomly chosen
maximal matching are updated in parallel. The matching guarantees that the correctness
of the simulation is preserved. The cost of random memory access due to computing over
a graph is mitigated by limiting un-coalesced global memory access to a single read in the
kernel. Additionally, using the Hilbert curve to order the vertices in the graph further im-
proves performance by reducing the in-memory distance between neighboring vertices. Using
these techniques, our GPU algorithm gives a speedup of a factor of 80 over the best serial
algorithm. This peak performance occurred with the excitable media model, when the most
efficient GPU algorithm was matched with the least efficient serial algorithm. Finally, our
techniques need not be limited to just modeling and simulating complex systems. The tech-
niques we presented could also be applied towards GPU acceleration of algorithms for general
purpose graph computation, assuming the problem can be framed in a manner similar to a
dimer automaton.
Chapter 4
Investigations
The elegant simplicity of modern models for complex systems such as cellular automata is
a double edged sword. While these models can provide compelling explanations for how
certain complexity arises, they often cannot be used to make accurate predictions about
particular phenomena without ad hoc refinement. Mapping observations of a particular
system to a simple rule that governs its behavior is a difficult task, often involving intuition
and guesswork. Some progress has been made in automating this task through the use
of evolutionary algorithms to search for rules that perform specific tasks such as density
classification, parity checking, synchronization, and pseudo-random number generation [30,
78, 34, 59, 111, 104].
Instead of searching for the best rule for a particular task, one could broadly search for rules
that have some form of interesting behavior. With this approach, Wolfram discovered several
elementary one dimensional cellular automata exhibiting complex behaviors beyond simple
homogenous and oscillating patterns [110]. These automata, which he labeled as Class III
and Class IV, exhibit complex and chaotic behavior with long transients. Later, Langton
built on this idea by measuring the relationship between a statistical property of the rule
table and the average behavior exhibited by the automaton [69]. He suggested that rules
capable of computation (e.g., Class IV automata) exist in a narrow region between automata
exhibiting periodic and fully chaotic behaviors (i.e., the “Edge of Chaos”). However the
correlation between the measured properties of the rule and the Class IV behavior was later
criticized as being too weak and the statistical measurements of the rules too simplistic [79].
Furthermore, Wolfram’s classifications are neither rigorous nor absolute, and Wolfram also
72
Dustin Lockhart Arendt Chapter 4. Investigations 73
admits that not all automata can be assigned strictly to any single group [110]. The thrusts
of these works are towards understanding the relationships between the rule and the behavior
it produces and discovering rules that produce new and interesting behavior. Clearly, these
tasks remain as open problems in complex systems
These problems are also the focus of the contributions in this chapter, but we narrow the
scope so that the interesting behavior we consider is self-organization, and the complex
systems we consider are dimer automata [95]. To date, self-organization has a myriad of
context-specific definitions; furthermore, not all of these definitions are consistent with each
other. However, one can generally say that self-organization is the transformation of a
disordered system to an ordered one without any authority globally coordinating the system
[10]. Therefore, we can suppose that dimer automata (and similar models including cellular
automata) that transform a random configuration into some significant pattern show self-
organization. In this chapter we develop and evaluate methods for finding the building
blocks of simple dimer automaton rules that are most correlated with self-organization (i.e.
dimer automaton motifs), and for using those building blocks to discover more complicated
self-organizing dimer automaton rules (i.e. the evolutionary motifs algorithm).
The chapter is organized as follows. First, we introduce how we measure self-organization
in dimer automata, and conduct a simple exhaustive search in a small dimer automaton
rule space. This search revealed three different rules exhibiting self-organization, which
motivates exploring larger search spaces due to the possibility of making further useful dis-
coveries. However, the difficulties associated with searching in larger rule spaces necessitates
an approach that is more sophisticated than an exhaustive search. Thus, we present our
evolutionary motifs algorithm in detail, followed by the results of a broader search for self-
organizing dimer automaton rules. Finally, we discuss the effectiveness of the evolutionary
algorithm and potential future challenges and directions.
4.1 Detecting Self-Organization
Since the goal is to search for dimer automata exhibiting self-organization, it is important to
have a way to quantify self-organization for a number of reasons. Presumably, in the search
we will be considering a large number (thousands or more) of dimer automata. The least
sophisticated search is one in which a human views each output, and manually classifies
Dustin Lockhart Arendt Chapter 4. Investigations 74
them. This was the approach initially taken and still endorsed by Wolfram in his search
of elementary cellular automata [110]. However, this approach quickly becomes impractical
when the human classifier must consider even a moderate number of images. So, at the
very least, a measurement of self-organization can be used to organize these images to make
this task easier. For example, results can be sorted according to this value so that the top
nimages are presented, or a threshold can be set so that only the images exceeding self-
organization are presented. More sophisticated techniques build on this by replacing the role
of the human with a computer algorithm.
Over the years, there have been many approaches towards understanding and measuring
self-organization [16, 17, 38, 47, 97, 96]. This is partially due to the term “self-organization”
itself being overly ambiguous, and partially due to the many different disciplines interested in
the phenomena (e.g., physics, mathematics, computer science) [47]. It is generally accepted
that a self-organizing system can transform an initially unstructured configuration in to one
with increasing structure over time. There is no clear consensus on what good definitions
of structure are, and how its increase over time should appear. However, there is consensus
that entropy is not directly related to self-organization, because such systems have elements
of both order and disorder that depend on scale [38]. Neither a purely homogenous config-
uration (with zero entropy) nor a purely random configuration (with maximal entropy) are
very interesting. Thus, an appropriate measurement will have large values for structured
configurations and small values for fully ordered as well as fully disordered configurations
(see Figure 4.1).
uniform randomstructure
entropy
Figure 4.1: Structure exists between pure uniformity and randomness, making entropy a
poor measurement of this quantity.
Our measurement of self-organization makes several assumptions for simplicity and utility.
Dustin Lockhart Arendt Chapter 4. Investigations 75
Almost all measurements consider how the structure of the system changes continuously
over time, however, we only consider the final configuration of the dimer automaton. If the
dimer automaton is initialized in a fully random state (which contains zero structure), then
we can infer that any structure observed is the result of self-organization. We measure this
final structure and assume this also measures the self-organization of the dimer automaton.
Furthermore, a measurement of self-organization in dimer automata should take into account
the topological distribution of states. The configuration should be locally organized, but
globally disorganized; we call this local structure. Knowing a vertex’s state should tell us
more information about vertices nearby than about vertices far away.
So, to measure local structure, we start by considering the probability that two randomly
chosen vertices’ states are identical given d(i, j), the shortest path distance between those
two vertices, thus
L(k) = P r[xt
i=xt
j|d(i, j) = k].(4.1)
If there is local structure within a configuration, then generally we would expect L(a)> L(b)
if a<b.L(1) is straightforward to compute; this is the probability that the endpoints of an
edge have the same state, and can be computed by traversing the edge set.
L(1) = 1
|E|X
(u,v)E
δ(xt
u, xt
v).(4.2)
When kis large enough, vertices should act as though their states are independent. For this
case we define
µ=P r[xt
i=xt
j]
X
σΣ
P r[xt
i=σ]2,(4.3)
as the probability that any two vertices have the same state, without knowing their topo-
logical separation. L(k) can be reduced to a single quantity by measuring the sum squared
difference between L(k) and µ, thus
L=X
k
[L(k)µ]2.(4.4)
This quantity measures the total discrepancy between the local and global statistics of a
Dustin Lockhart Arendt Chapter 4. Investigations 76
particular configuration; relatively large values of Lindicate the presence of local structure.
This satisfies the basic requirement for measuring structure since Lvanishes in the presence
of uniformity as well as randomness. In the uniform case L(k) = µ= 1 since all vertices at all
distances are equal. In the random case, if there are bdifferent states then L(k) = µ= 1/b
since the probability of picking vertices with the same state is 1/b, and this probability does
not change as a function of their distance. In either case L(k)µ= 0 for all k, so L= 0.
Measuring Lexactly is computationally expensive because fully evaluating L(k) requires
computing the shortest path d(i, j) for all pairs of vertices. This requires O(|V|3) time and
O(|V|2) space using the Floyd-Warshall algorithm, for example [43]. For large graphs (e.g.
|V|>104) this becomes prohibitive. Instead, d(i, j) can be sampled using Algorithm 10 for
a Monte Carlo approximation of L. We denote the measurement of Lusing this technique
as hLiin order to differentiate between the exact quantity and its approximation.
Algorithm 10: Samples the distance matrix d(i, j ) by performing a breadth first search
about random vertices.
Input:G(V, E ), nsamples, depth
Output:nsamples ×depth tuples (i, j, d(i, j)) where i, j Vand d(i, j) is uniformly
distributed ∈ {1,2, ..., d}
centers := nsamples random vertices from V;1
foreach icenters do2
for dj1..depth do3
front := {jV:d(i, j ) = dj};// from breadth first search about i4
jr: = random vertex from front;5
yield (i, jr, dj);6
end7
end8
4.2 Exhaustive Search of |Σ|= 3
We start with an exhaustive search of a small dimer automaton rule space with the purpose
of finding rules exhibiting local structure. For |Σ|= 4 there are 4,294,967,296 rules and at
least 178,956,970 isomorphism classes, which we found to be computationally intractable,
currently. For |Σ|= 3, however, there are 19,683 rules and at least 3,280 isomorphism
classes. Experimentally, we found there were actually 3,330 isomorphism classes, only slightly
Dustin Lockhart Arendt Chapter 4. Investigations 77
higher than the lower bound predicted by Equation 1.4. We ran a simulation for each rule
among each of the 3,330 isomorphism classes where |Σ|= 3 in order to determine which of
these rules exhibited local structure. In all simulations, the graph G= (V, E) used was an
isotropic two dimensional mesh created by Delaunay triangulation of points generated with
random sequential addition (see §1.7.2). The initial condition for Xwas chosen such that
P r[x0
i=σ] = 1/|Σ|. For each simulation, the system was iterated c· |V|time steps, which
is is roughly equivalent to performing citerations of a cellular automaton whose lattice has
|V|cells. We found experimentally that c200 was large enough for transient patterns to
develop, but small enough that experiments finished quickly. At the end of each simulation
we measured L(1), µ, and L.
Figure 4.2: Scatter plot L(1) versus µfor all 3,330 isomorphically unique rules in the search
space defined by |Σ|= 3.
Figure 4.2 shows L(1) versus µfor all 3,330 rules. From this we can see that the vast majority
of rules lie along the diagonal with L(1) µ. Recall that µis the probability that any two
vertices have the same state, and L(1) is the probability that two adjacent vertices have the
Dustin Lockhart Arendt Chapter 4. Investigations 78
same state. For local structure to exist, we would expect vertices to often have the same
state as their neighbors. So, when L(1) µ, this implies the absence of any meaningful
structure. From this figure we can see that the vast majority of rules where |Σ|= 3 do not
produce local structure. The outliers, however, do exhibit significant local structure, and are
shown in Figure 4.3. In addition to being outliers of the L(1) vs µdistribution, they also
have the three highest values of hLi.
0
1
1
2
2
2
1
0
1
2
0
2
1 2
0 1
0
1
1
2
2
0
Figure 4.3: Rules and corresponding outputs for the 3 outliers in Figure 4.2. Note that
the rules are shown here diagrammatically as finite state machines. An edge (x, y) in this
diagram means that the rule matrix has the form R(x, ) = ywhere is each of the labels
of that particular edge.
Dustin Lockhart Arendt Chapter 4. Investigations 79
4.3 Searching with Evolutionary Motifs
Considering the results presented so far, we can conclude that the exhaustive search was
fruitful. However, its limitations are obvious; spaces larger than |Σ|= 3, are too large to
search completely. Also, less than 0.1% of the rules in the search space contained what
we considered to be local structure, leading us to believe that this property is extremely
rare. Assuming the rarity of rules producing local structure is no worse in larger search
spaces, simple random sampling would be too inefficient. This motivates the need for a
more sophisticated technique for searching larger spaces. Our solution comes in two parts.
First, using the results from the exhaustive search we find the basic components of the dimer
automaton rules that are strongly correlated with producing local structure. In other words,
we find the dimer automaton motifs. Next, we use an evolutionary algorithm to evolve a
diverse population of rules exhibiting local structure. The rules are created by different
combinations of the motifs found in the previous step.
For generality, we describe our evolutionary motifs algorithm in terms of two black box
functions: a fitness function F:rule 7→ {0,1}; and a behavior function B:rule 7→ <n. The
evolutionary motifs algorithm can potentially be made to search for rules with any desirable
property, and not just local structure, which is the focus of this paper. The fitness function F
determines whether or not a rule has this desirable property. It is binary, as opposed to real
valued, because we do not wish to find a single rule that produces the best result. Instead,
we want to find a whole population of rules exhibiting the desirable property in unique ways.
The user must provide Bto quantify the features of the rule’s behavior. Ideally, if two rules
are similar in behavior, their coordinates in the feature space defined by Bwill be close. F
is used in both parts of the algorithm (i.e., to find motifs and to evolve the rules), whereas
Bis only used in the evolutionary portion of the algorithm.
4.3.1 Finding Motifs
Researchers examining real world complex networks in a variety of domains discovered that
these networks contained a statistically significant number of small, repeated subgraphs
compared to what is expected to occur at random [76]. These subgraphs are referred to
as network motifs, and they were discovered in genetic networks, ecological networks, the
worldwide web, and information processing networks. Since then, network motifs have also
Dustin Lockhart Arendt Chapter 4. Investigations 80
been found in many other areas [4]. It is reasonable to consider network motifs for rules
for complex systems. The concept of a motif is easily extended to dimer automata, as each
rule is a finite state machine (i.e., a directed network with the edges labeled). To find the
motifs strongly correlated with some desirable property, the fitness function Fis used to
partition a given set of rules into those rules that have the desirable property, and those that
do not. We assume that the motifs occurring with a statistically significant frequency in one
partition relative to the other contribute significantly to that rule’s presence that partition.
So, if a set of motifs can be found that strongly correlate with a desirable property, then
those motifs can hopefully be used to build other rules that also have that desired property.
For any dimer automaton having a discrete state space Σ, a rule is a finite state machine
(FSM), and can be represented as a |Σ|×|Σ|matrix R. Each element of Rrepresents a
single transition in the FSM. A motif is a subgraph of a FSM, or equivalently, a subset of the
elements of the matrix R. A motif can be represented in several ways (e.g., diagrammatically,
as a list, as a matrix). For example,
x\y012
01
1∗ ∗ 0
2∗ ∗ ∗
=(0,1,1),(1,2,0).
The asterisks in the matrix representation denotes that any value from Σ may be chosen for
that element. The choice of representation is dependent on how the motifs are to be used,
as one representation can, at times, be more convenient or efficient than others.
In order to find motifs we must be able to determine what motifs are contained within a given
rule R. Algorithm 11 accomplishes this by enumerating the power set of all elements of R.
Enumerating a power set has complexity of O(2n), so this algorithm has a time complexity of
O(2|Σ|2); for larger Σ this eventually becomes intractable. However, if we are only interested
in motifs containing a few edges in total, we can terminate the algorithm early, potentially
saving extra work.
The set of motifs generated using Algorithm 11 should be further filtered to remove motifs
with uninteresting or unwanted properties. In our experiments we omitted motifs that
contained self-loops (i.e. any motif containing an element of the form (i, j, i)), and any motifs
that were not at least weakly connected (these would be two or more separate motifs). For
Dustin Lockhart Arendt Chapter 4. Investigations 81
Algorithm 11: Enumerate every possible motif contained within some rule R.P(·)
denotes the power set.
Input:R: the rule; Σ: the state space
Output: all motifs within R
foreach {(i1, j1),(i2, j2), ..., (in, jn)} ∈ P2)do1
yield {(i1, j1, R(i1, j1)),(i2, j2, R(i2, j2)), ..., (in, jn, R(in, jn))};2
end3
a given set of rules and motifs, we can create a database of the form
D=
x11 x12 ··· x1nF(1)
x21 x22 ··· x2nF(2)
.
.
..
.
.....
.
..
.
.
xm1xm2··· xmn F(m)
,(4.5)
where xij is the number of times an isomorphism of motif jis found in rule iand F(i) is
the fitness of that rule. The strength of motif j’s correlation with a rule’s fitness is sj, and
is measured by comparing that motif’s frequency in both the fit and unfit partitions, thus
sj=Pm
i=1 F(i)·D(i, j)
Pm
i=1 F(i)Pm
i=1 (1 F(i)) ·D(i, j)
Pm
i=1 1F(i).(4.6)
Once sjis computed for each motif, we can find the solution in a number of different ways.
If sj>0 then that motif is positively correlated with the desirable property we are searching
for. However, if there are many such motifs, then it may be wise to omit some of the less
important ones. One technique is to omit dominated motifs. A motif miis dominated by
another motif mjif an isomorphism of miis a subgraph of mjand si< sj. If this is the case
then we can presume that the dominated motif’s strength is due mostly to the fact that it
is a sub-graph of another, stronger motif. Another technique could be to omit all the motifs
whose strengths are not statistically significant.
4.3.2 Evolving Rules
Evolutionary algorithms are often used to find solutions to global optimization problems
when the fitness landscape is too rough for other easier techniques to be effective [77]. An
evolutionary algorithm generally operates by measuring the fitness of a population, removing
Dustin Lockhart Arendt Chapter 4. Investigations 82
the worst performing individuals, and creating offspring to replace those lost. Offspring
are created through mutation and crossover of the chromosome (the information that fully
encodes an individual). This process is repeated a number of times, or until a good solution
is found. Cellular automata work well with evolutionary algorithms because the rule can
be used directly as the chromosome. Thus, evolutionary algorithms have been used with
some success to find cellular automaton rules to perform a specific task such as density
classification, parity checking, synchronization, and pseudo-random number generation [30,
78, 34, 59, 111, 104]. Researchers have also developed algorithms to evolve automata that
form spatially extended patterns [60, 102, 103]. However, the quality, usefulness, and variety
of these results leave much room for improvement.
Two major aspects of evolutionary algorithms are selection and transformation [77]. Selec-
tion determines which subset of the population continues to the next iteration, and trans-
formation determines how to generate new members of the population from current ones.
It is important for selection and transformation to preserve elitism and promote diversity
in order for the search to be effective. Elitism ensures that top performing members of the
population continue to the next generation. Diversity means that the population contains
a wide variety of individuals, which helps prevent the search from converging to a locally
optimal solution. The search strategy we propose is based on the NSGA-II algorithm for
multi-objective optimization, as both searches have several similar goals [31].
Our algorithm can be described in terms of (P, T, S), where Ptis the population (a set of
individuals) at round t,Tis a function that transforms the population to create offspring,
and Sis a function that selects a sub set of the population. The evolution of the system is
described by the recurrence relation
Pt+1 =S(PtT(Pt)).(4.7)
For practical purposes, we wish for the size of the population to remain constant, so, by
design |T(P)|=|P|and |S(P)|=|P|/2 so that |Pt+1|=|Pt|. We use Bto improve the
diversity of the population over time by removing individuals whose behavior is similar to
other individuals. This can be done by iteratively removing the individual whose minimum
distance to all other individuals is the smallest. Selecting for fitness and then diversity in this
manner essentially splits selection into two cases. Early in the search, all the fit individuals
and some of the unfit will survive. Eventually, the entire population will be fit and selection
will depend entirely on B. We postpone selecting for diversity until the entire population is
Dustin Lockhart Arendt Chapter 4. Investigations 83
fit. Until then, selecting a subset of the unfit individuals is done randomly.
We perform transformation by cut and splice crossover combined with random point muta-
tions. As crossover is the primary mechanism for producing offspring, it is important that
individuals are encoded in a manner that allows this transformation to produce viable off-
spring that inherit properties from both parents. Performing transformations directly on the
rule matrix is not amenable to meaningful crossover because this can significantly alter the
structure of the rule. Instead, individuals are represented in an intermediate format that
can be transcribed to produce the rule. We will refer to this simply as the chromosome C
such that ciis a motif. The chromosome is transcribed by starting with the fully quiescent
rule RQ(x, y) = x(x, y) and applying each motif to this rule in sequence. The population
at each generation transformed by selecting a random partner for each individual in the
population, and performing crossover. Following crossover, each element of each motif in the
offspring’s chromosome has a chance to be mutated according to the point mutation rate.
Mutation simply replaces that element with one chosen completely at random from Σ3.
4.3.3 Experimental Setup & Results
Previously we showed that Lwas effective measuring the local structure exhibited by dimer
automata, and consequently, the self-organization. Therefore, we design the fitness function
so that
F=(1if hLi> 
0else (4.8)
We chose = 0.1 as this value was between the Lvalues for the three outliers and the
remaining rules from the |Σ|= 3 search. The non-dominated motifs most strongly correlated
with local structure from this search space are shown in Table 4.1.
Table 4.1: Non-dominated motifs positively correlated with local structure in |Σ|= 3
motif strength
(0,2,1),(1,0,0),(1,2,2) 1.112
(0,1,1),(1,2,0),(2,1,0) 1.112
(1,2,2),(2,0,0) 0.6637
(0,1,1),(0,2,2),(1,2,0),(2,1,0) 0.6294
(0,2,1),(1,0,0),(1,1,2),(1,2,2),(2,0,1),(2,1,1) 0.3255
(0,1,1),(1,2,2),(2,0,0) 0.2579
Dustin Lockhart Arendt Chapter 4. Investigations 84
The behavior function consists of three different measurements,
B= (hLi,1
µ|Σ|,H) (4.9)
where Land µare the same quantities from the fitness function, and
H=1
|E|X
(i,j)E
J(xt
i, xt
j).(4.10)
Jis a function that describes the configuration energy of the dimer automaton. This concept
is taken from statistical mechanics. However, we modify the pairwise energy function Jto
depend on whether the rule Rwill change the state of either endpoints of an edge, thus
J(x, y)=[δ(x, R(x, y)) + δ(y, R(y, x))]/2.(4.11)
Using F,B, and the top three motifs previously found, we ran the genetic algorithm for 60
generations with an initial population of 1,536. The chromosomes of the initial population
were initialized so that chromosome icontains 1 + bi/10cpermutations of randomly chosen
motifs. For computer architecture specific reasons we consider rules in the space where
|Σ|= 8. The final behavior of the population is shown in Figure 4.4. Because the population
is large, we use k-means clustering to partition it into a small number of groups, and pick a
representative sample (e.g., the cluster center) from each group. These centers are shown in
Figure 4.5.
4.3.4 Discussion
The results presented thus far show that the evolutionary algorithm we detailed is capable
of finding a wide variety of rules that exhibit interesting behavior. However, we should
determine if this evolutionary algorithm is more effective than a random search, at the very
least. Furthermore, we should determine the advantage of point mutations in speeding up
the search. We can address these two issues by modifying the mutation rate for a number
of independent runs of the evolutionary algorithm. In order to objectively quantify the
fitness of the entire population, we measure the diversity of the population. The diversity
is measured as the average distance to the knearest neighbors (using the behavior function,
Dustin Lockhart Arendt Chapter 4. Investigations 85
Figure 4.4: K-means clustering of behavior space after 60 generations. Each data point
measure’s the behavior of a dimer automaton rule.
Figure 4.5: K-means cluster centers.
Dustin Lockhart Arendt Chapter 4. Investigations 86
Figure 4.6: The effects of point mutation on the diversity the population over time.
B) of each individual in the population. We let k= 3 for our experiments. We only consider
the diversity of the population once the population reaches 100% fitness. The results of this
experiment are shown in Figure 4.6 and Figure 4.7.
To understand the effects of point mutation, consider the extreme cases of very high and
very low mutation rates. When the mutation rate is high, this is nearly equivalent to a
random search without the aid of motifs. Such a search must rely on blind luck to find
a rule that produces local structure. Because such rules are rare, we would expect high
mutation rates to have poor performance. Lower rates of point mutation allow motifs to
improve the efficiency of the search. Similarly, we expect extremely low rates of point
mutation to also be problematic. If crossover is the only mechanism for generating new
chromosomes, the potential range of the search is fixed when the search is initiated. There
may be significant regions of the search space that cannot be explored. Point mutation
provides a way to potentially generate every possible rule, thereby broadening the capabilities
Dustin Lockhart Arendt Chapter 4. Investigations 87
Figure 4.7: The effects of point mutation on the diversity of the final generation.
Dustin Lockhart Arendt Chapter 4. Investigations 88
of the search. However, Figure 4.7 only supports the first claim, that high rates of point
mutation hinder the search. It appears that below a certain point mutation rate, there
is no significant improvement or detriment to the performance of the algorithm. In other
words, point mutation appears to be unnecessary. We should note that this claim holds only
for this particular experiment, including the specific motifs, fitness function, and behavior
function used. It is plausible that point mutation could be more important under different
circumstances, or if the fitness of population is measured differently.
4.4 Conclusions
Searching for self-organizing dimer automaton rules (e.g., those that transform randomness
into local structure) is challenging. The search space grows at such an extreme rate that
only the smallest cases can be exhaustively searched. An exhaustive search of the rule space
where |Σ|= 3 found three basic rules exhibiting self-organization. This motivates searching
in even larger spaces for other interesting rules, but this is challenging because of the rarity
of interesting rules and the rate at which the search space grows as a function of Σ. To aid
the search, we found the dimer automaton motifs that were most strongly correlated with
producing local structure in the original exhaustive search. We designed an evolutionary
algorithm to search various combinations of these motifs to discover interesting rules in a
much larger space (i.e. where |Σ|= 8). The use of dimer automaton motifs in this manner
allowed the search to be significantly more efficient than a random search of the space, and
many more rules exhibiting local structure were discovered. Future work in this area can
consist of applying the evolutionary motifs algorithm to different contexts and modeling
frameworks.
Chapter 5
Generalizations
The discrete nature of cellular automata [110, 92], boolean networks [65], and other models
is both a blessing and a curse. Discrete models have simpler implementations on a digital
computer compared to their continuous counterparts, such as PDE’s. The rules describing
the dynamics of discrete systems are often easy to understand and implement, and there is
no cause to worry about round off error or numerical stability. In many cases, the state of
the system can be represented with 8-bit integers, using less memory and CPU resources
compared to the floating point representations needed by continuous models. Unfortunately,
cellular automata and other models typically lack an important feature, a tunable level of
detail, common to numerical solutions to PDE’s. One can maximize the accuracy of a nu-
merical solution of a PDE by appropriately adjusting the spatial and temporal discretization
based on memory and CPU constraints. This is not the case when we consider classical cellu-
lar automata. Increasing the lattice size and number of iterations creates a larger simulation
that runs longer, but this only equates to additional accuracy in special circumstances.
So, is there a way to reconcile the simple and stable nature of discrete models with the
tunable nature of continuous ones? There are many ad-hoc examples of this where cellular
automata rules are accompanied by a parameter to that affects the size of the state space.
This effectively tunes the model to exhibit smoother, more continuous behavior. For exam-
ple, the Greenberg-Hastings [51] and cyclic cellular automata [39, 40] models for excitable
media allow an arbitrary number of integer states that affects the size of the spirals in the
simulation. Other approaches have focused directly on cellular automata as direct discretiza-
tions of various PDE’s. For example, a coarse discretization of the Belousov-Zhabotinsky
89
Dustin Lockhart Arendt Chapter 5. Generalizations 90
reaction was very effective in reproducing the important qualitative aspects of that phe-
nomena, including wave curvature and dissipation [45]. Weimar developed a technique for a
quantitatively accurate cellular automaton discretization of reaction diffusion systems [107].
A similar technique based on operator splitting is used by Narbel for cellular automaton
discretization of the Fitzhugh-Nagumo equation [82]. Roughly speaking, these techniques
carefully transform the continuous phase space of various PDEs into a discrete map to be
used as the cellular automaton rule.
These techniques map well known continuous models into cellular automata, but is it possible
to accomplish the reverse? Can we start with an arbitrary, coarse-grained, discrete rule and
develop it into a continuous model? This has been accomplished for elementary cellular
automata using the “inverse ultradiscretization technique” [67]. However, this technique
essentially maps the cellular automaton rule into a set of continuous functions that exactly
reproduce the discrete behavior of the rule. A more useful and intuitive approach would be
to increase the automaton’s state space one state at a time so that in the limiting case, the
behavior appears continuous.
In §4.2 an exhaustive search of the dimer automata rule space where |Σ|= 3 revealed three
rules exhibiting self-organization. Through further analysis we determined that the first and
third rules shown in Figure 4.3 could be grown to arbitrary sizes. By this, we mean that
these rules could be modified to handle additional states in a manner consistent with their
original behavior. Figure 5.1 shows the progression to larger state spaces for these rules.
In the first case, generalization preserves the star shape; in the second case, generalization
preserves the cycle shape. The edges were labeled in a manner consistent with the semantics
of the original rules. The star shaped rule models grain growth, and the cycle shaped rule
models spiral waves (i.e. excitable media). Further discussion of these rules can be found in
§2.2 and §2.4.
These generalized rules were engineered by studying the behavior of the original rules; we
intuited a way to modify the rules to produce their appropriate behaviors. However, a
general technique to produce continuous behavior from any seed rule would be very useful.
For example, this would enable searching a space continuous behaviors with the potential to
discover simple new explanations for complex phenomena, which is the main focus of this
work. We propose a generic tunable discrete model for complex systems called elastic dimer
automata that we use to search for simple rules exhibiting self-organization.
Dustin Lockhart Arendt Chapter 5. Generalizations 91
0
1
1
2
2
2
1
0
1
1
3
3
2
2
2 3
1 2
1 3
0
1
1
3
3
2
2
4
4
2 3 4
1 2 4
1 3 4
1 2 3
0
1
1
3
3
2
2
5
5
4
4
2 3 4 5
1 2 4 5
1 3 4 5
1 2 3 4
1 2 3 5
0
1
1
2
2
0
0
1
1
2
2
3
3
0
0
1
1
2
2
3
3
4
4
0
01 1
2
2
3
3
4
5
5
0
4
Figure 5.1: Generalization of the domain coarsening and excitable media rule to increasing
sizes. As states are added the resulting behavior is “sharpened.”
Dustin Lockhart Arendt Chapter 5. Generalizations 92
Elastic dimer automata allow stretching (i.e., increasing the state space) to produce smooth,
continuous behavior in a generic manner, overcoming the issues discussed above. We also
develop a simple statistical method to measure self-organization in dimer automaton config-
urations having many states. Together, elastic dimer automata and this measurement let us
perform an exhaustive search of the resulting behavior space, yielding several simple inter-
esting rules. The rules discovered included several variants of excitable media phenomena
with various wave behaviors and labyrinthine patterns that repair dislocations over time.
5.1 Measuring Structure in Large Rules
Measuring local structure with Lusing the method from §4.1 may not be sufficient for rules
with many states. For example, the excitable media rule produces spatial gradients instead of
homogenous regions. This causes configurations to have a low Ldespite the visually apparent
spatial patterns. Gradients are characterized by adjacent states being close in value, but not
necessarily equal. The original definition of Lis more suited towards measuring structures
produced by coarsening phenomena. It may be more useful to measure structure by asking,
how much information does the spatial distance between iand jprovide about the finite
state machine distance between xt
iand xt
j?
Let DRand DGbe random variables representing the distance between two vertices’ states
and spatial locations respectively. If iand jare vertices chosen uniformly at random from
the graph G, then
DR=dR(xt
i, xt
j),
DG=dG(i, j).(5.1)
The functions dRand dGrefer to the finite state machine and graph shortest path distances1,
respectively. We propose measuring local structure by the mutual information of these two
distributions,
LI=I(DR;DG).(5.2)
This measurement of local structure agrees well with our intuition about structure in the
1Since Gis potentially very large, it is not wise to compute dG(i, j)(i, j)V2. Instead, (i, j ) pairs can
be sampled by repeatedly picking a random vertex, performing a breadth first search of fixed depth, and
picking a random vertex from each depth.
Dustin Lockhart Arendt Chapter 5. Generalizations 93
uniform and random cases. In either of these two cases, DRand DGare statistically inde-
pendent, so the mutual information is 0. This measurement is similar to the concept of long
range mutual information [17], and is a simplified and adapted version of Shalizi’s “light
cone” approach [96]. Also, we note that our measurement is very general, as it applies to
any configuration represented by a graph where vertices take discrete states. This means
it can potentially be used for other complex systems models as long as dris appropriately
defined.
A visualization of this distribution is shown in Figure 5.2 where each pixel (i, j) in the image
is proportional to P r[DR=i, DG=j]. This compares the joint probability distributions
before and after the creation of spiral waves from an excitable media model. The first
joint probability distribution shows no overall structure, which is intuitive since the initial
condition is totally random. However, the second distribution has several interesting features.
Most notably, there is a dark triangle in the upper left corner, implying that P r [DR>
DG]0. Roughly speaking this means that in this case the topological distance is an upper
bound on the distance between any pair of states. Finally, using Equation 5.2, the mutual
information is 0.021 in the initial configuration and 0.574 in the final configuration.
5.2 Elastic Dimer Automata
Our primary goal is to formally define a technique for creating arbitrary generalizable rules;
the technique we develop here we refer to as elastic dimer automata. These are rules that
are created through a two step process where an initial graph is stretched and its edges are
labeled to produce the finite state machine for a dimer automaton rule. In the first step,
an initial graph, G0is stretched stimes through an edge rewriting process to produce a
sequence of graphs {G0, G1, G2, ..., Gs1, Gs}. Edges are rewritten according to
ij → {
ik ,
kj },
ij → {
ik,
kj},(5.3)
and kis added to V. All edges are rewritten in parallel so that the graphs grow uniformly,
and in a manner that preserves (qualitatively) the original shape. Thus, these operations
result in Gsbeing, qualitatively speaking, a stretched version of the initial graph G0.
We let Gsdefine the set of allowable transitions in the finite state machine for each state.
Dustin Lockhart Arendt Chapter 5. Generalizations 94
Figure 5.2: Joint probability distribution (top) before and after simulation of excitable media
(bottom).
Dustin Lockhart Arendt Chapter 5. Generalizations 95
Let N[i] be the inclusive neighborhood of vertex idefined by Gs. Thus, N[i] is the set of
states that imay transition to (including remaining i). Deciding what causes this transition
allows the edges of the finite state machine to be labeled, creating the dimer automaton
rule R. Suppose that for each (i, j)Σ2there is a configuration energy defined by J(i, j).
The rule Ris defined by picking a pair of states from the set of allowable transitions that
minimizes the configuration energy, thus
(R(x, y), R(y, x)) = min
(i,j)N[x]×N[y]J(i, j) (5.4)
However, this equation creates some ambiguity if there is more than one pair of transitions
tied for the minimum. There are a number of potential ways to handle this tie. One way is
to simply disallow transitions when there is not a unique minimum. Another option could
be to swap iand j(we will refer to this as zero-swapping). The logic is that swapping
iand jneither increases nor decreases that edge’s configuration energy. Swapping injects
enough energy into the system to avoid totally frozen configurations, where no transitions
are occurring. Yet another approach is to pick the (i, j) corresponding to the smallest unique
energy state. This is analogous to a ball placed on the top of a hypothetical mountain. If
both slopes of the mountain are equally steep, the ball will roll down the ridge between the
two slopes, even though this route is less optimal. We use a combination of this approach
and swapping, where swapping is performed if there is no better unique energy configuration.
Let Jbe a function of the distance between iand jin Gssuch that
J(i, j) = F(min[dR(i, j ), dR(j, i)]).(5.5)
Note that Jis symmetric, which is important since we are still assuming space is undirected.
For simplicity we let F(d) = dso that the configuration energy of (i, j) increases as iand
jmove farther apart in Gs. Under the above assumptions, the rule Rfor an elastic dimer
automaton is fully defined by G0and s. The original generalized excitable media rule is a
special case of the elastic dimer automaton, where G0= ({0,1,2},{(0,1),(1,2),(2,0)}) and
F(d) = min[d, z].
Dustin Lockhart Arendt Chapter 5. Generalizations 96
5.3 Searching Elastic Dimer Automata
The previous assumptions give us a good starting point to search for interesting elastic
dimer automata, with the search space being transformed to the set of all directed graphs.
However, this space grows very quickly since there are 2|V|2possible adjacency matrices for
a directed graph with |V|vertices. Fortunately, the search space can be further reduced by
assuming that each graph
1. contains no self loops,
2. is strongly connected, and
3. is isomorphically unique.
It is not useful to allow graphs with self loops because a self loop would specify allow states
to transition to themselves, which is already allowed by Equation 5.4, since N[x] includes x.
A strongly connected graph is one in which there is a path of finite length between every pair
of vertices. In other words, every vertex is recurrent, so no vertices are absorbing/transient
(i.e., their out/in-degree are zero). Because transient states have no incoming edges, the
dimer automaton’s configuration may eventually contain no such states. On the other hand,
once part of the dimer automaton’s configuration is absorbing, it can never change to another
state. Finally, similar to the exhaustive search from §4.2, it is useful to discard rules that
are isomorphically equivalent to others since they perform identical jobs.
Algorithm 12 enumerates all of the isomorphically unique weakly connected directed or
undirected graphs with no self loops using a dynamic programming approach. Each graph
produced by this algorithm can then be checked for strong connectedness. This cannot be
done inside the loop, however, because a subgraph of a strongly connected graph may be
weakly connected or worse. Testing for isomorphism in the innermost loop of the algorithm
is O(|V|2·|V|!). This causes the algorithm to run very slowly when |V|>6 in the undirected
case and when |V|>5 in the directed case. The 89 unique rules under the above assumptions
are shown in Figure 5.3. Without any assumptions, this space would have up to 216 = 65,536
graphs.
First, we conducted an exhaustive search of the space of elastic dimer automata where
|V0|= 4. For each of these rules, we start with random initial conditions run a simulation
Dustin Lockhart Arendt Chapter 5. Generalizations 97
Algorithm 12: Enumerates all isomorphically unique single component (un)directed
graphs.
Input:n, directed
Output:C
C:= [,, ..., ]; // C[i]contains unique graphs with ivertices1
C[1] := {[0]};// base case: a graph with 1 vertex and no edges2
for i= 2..n do3
if directed then4
edges ={0,1}2(i1);// cartesian power5
else6
edges ={0,1}i1| {0,1}i1;// element-wise tuple concatenation7
end8
edges := edges (0,0, ..., 0); // must add at least one edge9
foreach GC[i1] do10
foreach (e0, e1, ..., ei1, ei, ei+1, ..., e2i1)edges do11
H:=
e0
G.
.
.
ei1
ei··· e2i10
;
12
if no permutation of His in C[i]then13
C[i] := C[i]∪ {H};14
end15
end16
end17
end18
Dustin Lockhart Arendt Chapter 5. Generalizations 98
Figure 5.3: The 89 isomorphically unique, strongly connected, directed graphs with 4 or
fewer vertices. These graphs are then stretched twice according to the rewriting rules.
Dustin Lockhart Arendt Chapter 5. Generalizations 99
long enough2for the system to exhibit its long term behavior. In each simulation, the
spatial topology used was an isotropic planar graph, roughly equivalent to a 600 ×600
square lattice. From the set of 89 rules, we identified four interesting and unique forms of
self-organizing behavior: rules 13, 18, 55, and 85. The rule’s number refers to the order in
which Algorithm 12 generates each graph. Figure 5.4 shows the information LIand energy H
of these four rules over time (with and without swapping) and Figure 5.5 shows the topology
of each rule and its configuration after a long time. The configuration energy is the average
energy of each edge in the graph using the above definition of J, thus
H=1
|E|X
ij E
J(xi, xj).(5.6)
Rule 13 produces fast moving spiral waves (see §2.4 on excitable media) and swapping
produces slightly larger waves in this case, but there is not a significant change in the
qualitative behavior or in the information-energy curve. Rule 18 shows a rapid decrease in
energy relative to the more gradual trends seen in the other rules. Furthermore, in both
cases, there is a spontaneous increase in the configuration energy eventually following the
initial sudden drop. This increase is then followed by yet another drop, where the system
eventually settles down. With swapping, rule 18 produces a combination of quickly moving
wave fronts that leave a wake of slowly decaying uniform regions. Without swapping, the
behavior is characterized by rounded homogeneous regions that spontaneously transition to
a different state. Rule 55 appears to be a frustrated version of rule 13. The length scale
of the waves are much larger, and spirals are less fully developed than those seen in rule
13. Swapping allows the waves to propagate more quickly, but does not appear to affect
the qualitative behavior. Rule 85 is perhaps the most interesting, producing labyrinthine
patterns without zero-swapping (an example of this is shown in Figure 5.6). Its configurations
are characterized by dislocations, which gradually disappear over time, resulting in a low
energy, high information configuration. Swapping allows the dislocations to be fixed more
quickly, but interestingly, causes the information to decrease eventually, perhaps because
of the creation of large, nearly uniform regions. Once the labyrinthine pattern is formed,
removing dislocations drives the system closer to a uniform state, decreasing its information
content.
2Experimentally, we found that 300 · |V|edge updates were sufficient.
Dustin Lockhart Arendt Chapter 5. Generalizations 100
Figure 5.4: Information-Energy time series of rules 13, 18, 55, and 85 with (top) and without
(bottom) swapping.
Dustin Lockhart Arendt Chapter 5. Generalizations 101
Figure 5.5: Rule (left) and corresponding elastic dimer automaton 13, 18, 55, and 85 output
allowing (right) and not allowing (middle) swapping at local energy minima.
Dustin Lockhart Arendt Chapter 5. Generalizations 102
5.4 Discussion
Our intention in creating elastic dimer automata is to couple the tunable level of detail of
continuous models such as PDEs with the simplicity, speed, and stability of discrete models
such as cellular automata. The results thus far are promising, with elastic dimer automata
producing a variety of interesting phenomena with continuum-like behavior. The stretching
operations have the intended effect of tuning the level of detail in the automaton. Stretching
the initial graph increases the characteristic length scale of the structures produced by the
elastic dimer automaton. An example of this is shown in Figure 5.6 for rule 85 without
swapping. Stretching causes structures to take up more space, and to require more time to
develop. Finally, we note that random asynchronous updating is a closer approximation of
continuous time than the synchronous method used by cellular automata since state changes
are discrete. Synchronous updating in cellular automata moves the system forward in one
large leap because state is discrete, as opposed to synchronous updating in PDE’s where
state is continuous, and is updated by very tiny increments. So we conclude that stretching
elastic dimer automata is analogous to simultaneously tuning their space and time step,
which provides the tunable level of detail we have sought in their design.
(a) |Σ|= 24 (b) |Σ|= 48 (c) |Σ|= 96 (d) |Σ|= 192
Figure 5.6: Stretching increases the level of detail without qualitatively affecting its behavior
(rule 85 without swapping is shown).
5.4.1 Rendering
Although the mutual information is useful in identifying configurations with structure, it is
still important to render the configurations so they can be interpreted visually. The na¨ıve
approach of using a grayscale value directly proportional to each state will not work as
it has in previous experiments. The reason is that there may be little to no correlation
Dustin Lockhart Arendt Chapter 5. Generalizations 103
between a state’s value and its context in the finite state machine. So, it is necessary to
develop an alternative way of rendering configurations. Recall that elastic dimer automata
are constructed from an initial seed graph G0, and that the indices of this original graph
map directly to their corresponding indices in the final stretched graph Gs. Therefore, we
let the color c(i) corresponding to a state iin Gsbe dependent on that state’s distance to
one of the original vertices such that
c(i) = min
jV0
dR(i, j).(5.7)
In other words, a state’s color will be proportional to that state’s shortest distance to one
of the original vertices from V0. The effect is to accentuate regions of near-equal state, and
for smooth state transitions in space to appear as gradients. An example of the usefulness
of this transformation is show in Figure 5.7 compared to the na¨ıve case where c(i) = i.
(a) na¨ıve rendering with c(i) = i
(b) better rendering with c(i) = minjV0dR(i, j)
Figure 5.7: A comparison of two rendering techniques for dimer automata.
Dustin Lockhart Arendt Chapter 5. Generalizations 104
5.4.2 Distribution of Behaviors
An interesting question to ask for this system is, what is the distribution of elastic dimer
automata behaviors in the information-energy space? To approach this, we extended the
search from the previous section by one, so that |V0| ≤ 5. This resulted in a total of 5137
suitable unique rules. The information and energy of the final configuration of each rule is
plotted in Figure 5.8. Additionally, we used k-means clustering to pick 9 characteristic rules;
the configurations of each of these rules are also shown in the plot. The information-energy
distribution has several interesting features that become apparent as a result of the larger
search space. First, there appears to be a clear upper bound on the ratio of information, so
we hypothesize that
H=O(L1
I),(5.8)
as evidenced by the emptiness of the upper right portion of the plot. Furthermore, there
appears to be another empty region where LI[0,0.1] and H>1. This suggests, at least
for this experiment, a correlation between very low information and very low energy. In
other words, the configurations in this region are mostly uniform, and not mostly random.
The tail of the distribution where LI2.5 corresponds to low energy high information
configurations that result from spiral waves (e.g. rule 13). The remaining rules that generate
high information configurations in the distribution appear to be somewhat similar to rule
18.
5.5 Conclusions
We have presented a technique, elastic dimer automata, that when given a small (i.e., gran-
ular) state transition graph, creates a dimer automaton rule in a manner that shows suc-
cessively more continuous like behavior. Additionally, we have developed a measurement of
self-organization tailored towards these rules based on the mutual information between state
and space distance distributions. Using these tools, we performed an exhaustive search of
a simple class of elastic dimer automata which revealed several interesting rules, and a rich
behavior space.
Future work can take several directions. One question we may ask is, can we derive the PDE
corresponding to a given elastic dimer automaton directly by considering the infinite limit
Dustin Lockhart Arendt Chapter 5. Generalizations 105
Figure 5.8: Clustering of behavior resulting from the 5137 elastic dimer automata with
|V0| ≤ 5
Dustin Lockhart Arendt Chapter 5. Generalizations 106
of states? This may be simple for cases such as rules 13 and 85 where the transition graph is
simply a cycle, allowing a direct mapping between the discrete state and its continuous phase
angle. However, in the general case, deriving the PDE may be less trivial; perhaps other
variants of elastic dimer automata would facilitate this better. Furthermore, we consider
only a very simple energy function to minimize; future work may consider other classes of
energy functions or even domain-specific functions to model a narrower range of physical
phenomena. Finally it is useful and frequently challenging to demonstrate the connection
between the abstract rules for complex systems and the corresponding physical phenomena
(if such a connection exists).
Chapter 6
Conclusions
It is common to find among physicists and other scientists a hopefulness for elegance and
simplicity in the laws governing the universe. Thus, the rapid growth in complex systems
research is not surprising, as its central dogma states that simplicity is at the heart of com-
plexity. Complex systems theory and applications have been able to provide new insight in
economics, sociology, epidemiology, neuroscience, ecology, and many other fields. Today’s
models for complex systems can be elegant and provide compelling explanations for how cer-
tain systems are driven. Unfortunately, it remains fairly difficult to design truly simple rules
describing natural phenomena. Additionally, the complexity seen in deterministic models
for complex systems such as cellular automata may be unique to those models, making them
less widely applicable than initially hoped. This dissertation has focused on the question, do
simple rules for self-organization exist in stochastic complex systems, and how can we find
them?
This work has considered dimer automata exclusively towards answering this question. Their
unique simplicity allows them to operate on arbitrary topologies, something that is cumber-
some with cellular automata and other related models. A multitude of different real world
applications of dimer automata were examined, a testament to their surprising utility de-
spite their simplicity. In fact, this point deserves emphasis in its own right. Dimer automata
operate by examining two random components of the system at a time, and updating each
according to a simple rule. The fact that this procedure so convincingly reproduces the
phenomena of spiral waves, coarsening, aggregation, segregation, dislocation, etc., is a fun-
damental statement about complex systems in general. It suggests that universal phenomena,
107
Dustin Lockhart Arendt Chapter 6. Conclusions 108
that which we see repeated over and over at a multitude of scales and domains, are extremely
robust. These phenomena can be produced by many different systems and by many different
models, in spite of noise, and with lack of central control and fine tuning. Dimer automata
capture the essence of these phenomena with one of the simplest models imaginable.
However, even simple models can be impractical, and the two aspects of dimer automata that
distinguish them from other classical models make dimer automata challenging to implement
efficiently. Dimer automata rules operate on the edges of a graph (not on single vertices), and
this operation is intended to be asynchronous to avoid conflicts. However, this introduces
two major challenges towards parallelization: operating on arbitrary graphs makes memory
access random (so less efficient), and conflict avoidance is costly and non trivial. Thus, the
second important contribution of this work is the development of two GPU parallelizations
in §3 that make dimer automata efficient and scalable under very reasonable assumptions.
The first approach uses the inherent concurrency of many experiments to mitigate the prob-
lems with memory access patterns and serial updates by assuming all experiments see the
same order of updates. This allows a simple restructuring of the memory to facilitate coa-
lesced memory access, leading to a throughput of over 300 million updates/sec assuming at
least 1,000 concurrent experiments. The second approach parallelizes a dimer automaton
assuming a large low dimensional graph. The possibility of concurrency arises from the fact
that updates in one region of this graph do not immediately affect other regions sufficiently
far away. It is straightforward to compute a set of edges that can be updated in parallel
without conflict; these are maximal matchings. The GPU parallelization samples from a
balanced set of maximal matchings for a massively parallel approximation of serial updat-
ing. This approach also saw very high throughputs with little cost to accuracy for the cases
examined. This work shows that parallelism is not unique to cellular automata and similar
synchronous lattice based models; under a few reasonable assumptions it can be accom-
plished effectively for dimer automata as well. Because of the wide range of applications of
dimer automata, this work lays the foundation for many new GPU-parallel applications in
a variety of domains.
The discrete nature of models like cellular automata and dimer automata allow for the rule
governing the behavior of the system to be treated as parameter that is easily replaced, in-
stead of an integral component of the model. This should be contrasted to parameter space
exploration for PDE’s, where the adjustment of the constants for a single set of equations
produce several different outcomes. With discrete models, exploring the rule space is analo-
Dustin Lockhart Arendt Chapter 6. Conclusions 109
gous to considering every possible PDE. Wolfram and others have suggested this approach
can reveal explanations for physical phenomena that are simpler than any we are likely to
engineer ourselves [110]. Unfortunately, while this search is a simple idea in theory, it is quite
difficult in practice. The challenges include automated detection and classification of inter-
esting patterns, effectively sampling rule spaces that grow extremely quickly, and matching
rules with their appropriate physical phenomena.
The evolutionary motifs approach discussed in §4 approaches the first two challenges with
the intention of discovering rules exhibiting self-organization. Identifying rules with this
property was effectively accomplished by considering the amount of spatial structure added
by the dimer automaton when starting from a random initial configuration. Using this mea-
surement, an exhaustive search of a small dimer automaton rule space revealed three rules
that exhibit self-organization. From these three rules, a small set of motifs were found to be
strongly correlated with that self-organization. The motifs were then used to build larger
more complicated rules. Using these motifs in an evolutionary algorithm proved to be orders
of magnitude faster than a pure random search, verifying that the motifs are an effective
way to produce self-organizing rules. However, the success of the evolutionary motifs algo-
rithm in creating a diverse population of self-organizing rules also presented an unforeseen
challenge. The manner in which the algorithm uniformly fills the behavior space makes it
difficult to perform classification; there are too many different uniformly distributed behav-
iors, making cluster analysis fairly ineffective. This highlights an important area for future
work: identification of different behavior features that allow for more effective classification.
Many of the applications discussed in §2 are designed so that the number of states |Σ|in the
dimer automaton is a parameter of the model. In other words, these models are generalized
in the sense that the rule can grow and change shape in a manner that remains consistent
with the phenomena it is modeling. Furthermore, increasing |Σ|in these examples tends
to reduce the granularity of the model, improving the overall quality. However, the models
presented in §2 are ad hoc, so §5 develops elastic dimer automata as a way to generalize
arbitrary rules. The insight behind elastic dimer automata is to assume the generalized rule
is defined fully by a small initial graph. This graph is then stretched to an arbitrary size,
and the edges are labeled according to an energy minimization technique in order to produce
the dimer automaton rule. This generally results in a tunable level of detail with phenomena
often exhibiting smooth, continuous behavior. We also developed an improved measurement
of local structure more amenable to the patterns produced by elastic dimer automata. An
Dustin Lockhart Arendt Chapter 6. Conclusions 110
exhaustive search of elastic dimer automata revealed several noteworthy phenomena and a
rich behavior space. However, the class of elastic dimer automata searched were only the
most basic imaginable, with the energy function being the shortest path length between
two states. This function may be modified to produce an entirely different set of behaviors.
Furthermore, elastic dimer automata may provide the groundwork for novel ways to learn
the underlying rule for a system, given a set of observations.
One overreaching goal of this work has been to revisit dimer automata as more than an ele-
gant theoretical framework, but as a highly practical method for for simulation and modeling
of complex phenomena. The abundance of real world applications using highly simplified
rules reinforces the central dogma of complex systems research: that complexity arises from
the interactions of simple components over time. Additionally, this work has shown that
the interactions themselves can be quite elementary, involving just pairs of components, and
occurring randomly and one at a time. The tools developed in this work for efficient paral-
lelization, for measuring self organization and spatial structure, for determining motifs and
constructing more complex rules from these components, and for creating arbitrary discrete
rules that behave continuously lays the foundation for an abundance of worthwhile future
research.
Bibliography
[1] Susumu Adachi, Ferdinand Peper, and Jia Lee. Computation by asynchronously up-
dating cellular automata. Journal of Statistical Physics, 114(1-2):261–289, JAN 2004.
[2] A.I. Adamatzky. Voronoi-like partition of lattice in cellular automata. Mathematical
and Computer Modelling, 23(4):51 – 66, 1996.
[3] Maximino Aldana. Boolean dynamics of networks with scale-free topology. Physica
D: Nonlinear Phenomena, 185(1):45–66, OCT 15 2003.
[4] Uri Alon. Network motifs: theory and experimental approaches. Nature Reviews
Genetics, 8(6):450–461, JUN 2007.
[5] Dustin Arendt and Yang Cao. Effective GPU acceleration of large scale, asynchronous
simulations on graphs. Advances in Complex Systems, 2012.
[6] Dustin Arendt and Yang Cao. Elastic dimer automata: Discrete, tunable models
for complex systems. In 4th Cellular Automata, Theory and Applications Workshop
(*A-CSC’12), 2012.
[7] Dustin Arendt and Yang Cao. Evolutionary motifs for the automated discovery of
self-organizing dimer automata. Advances in Complex Systems, 2012.
[8] Dustin Arendt and Yang Cao. GPU acceleration of many independent mid-sized sim-
ulations on graphs. In 4th Cellular Automata, Theory and Applications Workshop
(*A-CSC’12), 2012.
[9] Adam Arkin, John Ross, and Harley H. McAdams. Stochastic kinetic analysis of devel-
opmental pathway bifurcation in phage lambda-infected escherichia coli cells. Genetics,
149(4):1633–1648, AUG 1998.
111
Dustin Lockhart Arendt Bibliography 112
[10] W. Ross Ashby. Principles of the self-organizing system. Principles of Self-organization,
pages 255–278, 1962.
[11] P Bak, K Chen, and C Tang. A forest-fire model and some thoughts on turbulence.
Physics Letters A, 147(5-6):297–300, JUL 16 1990.
[12] J. Balasalle, M.A. Lopez, and M.J. Rutherford. Optimizing memory access patterns
for cellular automata. GPU Computing Gems Jade Edition, page 67, 2011.
[13] Olga Bandman. Parallel simulation of asynchronous cellular automata evolution. Cel-
lular Automata, pages 41–47, 2006.
[14] C. Bradford Barber, David P. Dobkin, and Hannu Huhdanpaa. The Quickhull algo-
rithm for convex hulls. ACM Transactions on Mathematical Software, 22(4):469–483,
1996.
[15] Dwight Barkley. A model for fast computer-simulation of waves in excitable media.
Physica D, 49(1-2):61–70, APR 1991.
[16] Charles H. Bennett. On the nature and origin of complexity in discrete, homogeneous,
locally-interacting systems. Foundations of Physics, 16(6):585–592, 1986.
[17] Charles H. Bennett. How to define complexity in physics, and why. Complexity,
Entropy, and the Physics of Information, 8:137–148, 1990.
[18] Hugues Bersini and Vincent Detours. Asynchrony induces stability in cellular automata
based models. In Artificial Life IV, pages 382–387. MIT Press, MA, 1994.
[19] Ofer. Biham, A. Alan Middleton, and Don Levine. Self-organization and a dynamical
transition in traffic-flow models. Physical Review A, 46(10):6124–6127, 1992.
[20] Stephen P. Borgatti and Martin G. Everett. Models of core/periphery structures.
Social networks, 21(4):375–395, 2000.
[21] Maury Bramson and David Griffeath. Flux and fixation in cyclic particle systems. The
Annals of Probability, pages 26–45, 1989.
[22] Arthur R. Butz. Alternative algorithm for Hilbert’s space-filling curve. IEEE Trans.
Comput., 20:424–426, April 1971.
Dustin Lockhart Arendt Bibliography 113
[23] Yang Cao, Hong Li, and Petzold Linda. Efficient formulation of the stochastic sim-
ulation algorithm for chemically reacting systems. Journal of Chemical Physics,
121(9):4059 – 4067, 2004.
[24] Peter Clifford and Aidan Sudbury. A model for spatial conflict. Biometrika, 60(3):581–
588, 1973.
[25] John Conway. The game of life. Scientific American, 223(4):4, 1970.
[26] Matthew Cook. Universality in elementary cellular automata. Complex Systems,
15(1):1–40, 2004.
[27] Douglas W. Cooper. Random-sequential-packing simulations in three dimensions for
spheres. Phys. Rev. A, 38(1):522–524, Jul 1988.
[28] David Cornforth, David G. Green, David Newth, and Michael Kirley. Ordered asyn-
chronous processes in multi-agent systems. Physica D: Nonlinear Phenomena, 204(1-
2):70–82, MAY 1 2005.
[29] G. Cs´ardi and T. Nepusz. The igraph software package for complex network research.
InterJournal Complex Systems, 1695, 2006.
[30] Rajarshi. Das, James P. Crutchfield, Melanie Mitchell, and James E. Hanson. Evolving
globally synchronized cellular automata. In Proceedings of the Sixth International
Conference on Genetic Algorithms, San Mateo, CA. Citeseer, 1995.
[31] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. A fast and
elitist multiobjective genetic algorithm: NSGA-II. Evolutionary Computation, IEEE
Transactions on, 6(2):182–197, 2002.
[32] B. Derrida and Y. Pomeau. Random networks of automata: a simple annealed ap-
proximation. EPL (Europhysics Letters), 1:45, 1986.
[33] Andreas Deutsch and Sabine Dormann. Cellular Automaton Modeling of Biological
Pattern Formation. Birkhauser, 2005.
[34] Ezequiel A. Di Paolo. Rhythmic and non-rhythmic attractors in asynchronous random
boolean networks. Biosystems, 59(3):185–195, MAR 2001.
Dustin Lockhart Arendt Bibliography 114
[35] Nazim Fat`es and Michel Morvan. An experimental study of robustness to asynchronism
for elementary cellular automata. Complex Systems, 16(1):1–28, 2005.
[36] Nazim Fat`es, Damien Regnault, Nicolas Schabanel, and ´
Eric Thierry. Asynchronous
behavior of double-quiescent elementary cellular automata. LATIN 2006: Theoretical
Informatics, pages 455–466, 2006.
[37] Nazim Fat`es, ´
Eric. Thierry, Michel Morvan, and Nicolas Schabanel. Fully asynchronous
behavior of double-quiescent elementary cellular automata. Theoretical Computer Sci-
ence, 362(1-3):1–16, 2006.
[38] D.P. Feldman and J.P. Crutchfield. Measures of statistical complexity: Why? Physics
Letters A, 238(4-5):244–252, 1998.
[39] Robert Fisch. Cyclic cellular automata and related processes. Physica D: Nonlinear
Phenomena, 45(1-3):19–25, 1990.
[40] Robert Fisch, Janko Gravner, and David Griffeath. Cyclic cellular automata in two
dimensions. Spatial Stochastic Processes: A Festschrift in Honor of Ted Harris on His
Seventieth Birthday, K. Alexander and J. Watkins, eds, pages 171–188, 1991.
[41] Robert Fisch, Janko Gravner, and David Griffeath. Threshold-range scaling of ex-
citable cellular automata. Statistics and Computing, 1(1):23–39, 1991.
[42] T. Fischer. Coupled map lattices with asynchronous updatings. Annales de l’Institut
Henri Poincar´e-Probabilites et Statistiques, 37(4):421–479, JUL-AUG 2001.
[43] R.W. Floyd. Algorithm 97: shortest path. Communications of the ACM, 5(6):345,
1962.
[44] Harold J. Frost and Carl V. Thompson. Computer simulation of grain growth. Current
Opinion in Solid State & Materials Science, 1(3):361–368, JUN 1996.
[45] Martin Gerhardt, Heike Schuster, and John J. Tyson. A cellular automaton model
of excitable media including curvature and dispersion. Science, 247(4950):1563–1566,
MAR 30 1990.
[46] Carlos Gershenson. Introduction to Random Boolean Networks. eprint
arXiv:nlin/0408006, August 2004.
Dustin Lockhart Arendt Bibliography 115
[47] Carlos Gershenson and Francis Heylighen. When can we call a system self-organizing?
Advances in Artificial Life, pages 606–614, 2003.
[48] Michael A. Gibson and Jehoshua Bruck. Efficient exact stochastic simulation of chem-
ical systems with many species and many channels. Journal of Physical Chemistry A,
104(9):1876–1889, MAR 9 2000.
[49] Daniel T. Gillespie. Exact stochastic simulation of coupled chemical reactions. The
Journal of Physical Chemistry, 81(25):2340–2361, 1977.
[50] James A. Glazier and Denis Weaire. The kinetics of cellular patterns. Journal of
Physics-Condensed Matter, 4(8):1867–1894, FEB 24 1992.
[51] J. M. Greenberg and S. P. Hastings. Spatial patterns for discrete models of diffusion
in excitable media. SIAM Journal on Applied Mathematics, 34(3):515–523, 1978.
[52] A. ˚
A. Hansson, H. S. Mortveit, and C. M. Reidys. On asynchronous cellular automata.
Advances in Complex Systems, 8(4):521 – 538, 2005.
[53] Pawan Harish and P. Narayanan. Accelerating large graph algorithms on the GPU
using CUDA. In Srinivas Aluru, Manish Parashar, Ramamurthy Badrinath, and Viktor
Prasanna, editors, High Performance Computing – HiPC 2007, volume 4873 of Lecture
Notes in Computer Science, pages 197–208. Springer Berlin / Heidelberg, 2007.
[54] T.E. Harris. Contact interactions on a lattice. The Annals of Probability, pages 969–
988, 1974.
[55] K. A. Hawick, A. Leist, and D. P. Playne. Parallel graph component labelling with
GPUs and CUDA. Parallel Computing, 2010.
[56] Herbert W. Hethcote. The mathematics of infectious diseases. SIAM Review,
42(4):599–653, DEC 2000.
[57] Bernardo A. Huberman and Natalie S. Glance. Evolutionary games and computer
simulations. Proceedings of the national academy of sciences of the United States of
America, 90(16):7716, 1993.
[58] T. E. Ingerson and R. L. Buvel. Structure in asynchronous cellular automata. Physica
D: Nonlinear Phenomena, 10(1-2):59 – 68, 1984.
Dustin Lockhart Arendt Bibliography 116
[59] Francicso Jim´enez-Morales, Melanie. Mitchell, and James P. Crutchfield. Evolving
one dimensional cellular automata to perform a non-trivial collective behavior task:
One case study. In Peter Sloot, Alfons Hoekstra, C. Tan, and Jack Dongarra, editors,
Computational Science — ICCS 2002, volume 2329 of Lecture Notes in Computer
Science, pages 793–802. Springer Berlin / Heidelberg, 2002.
[60] Francisco Jimen´ez-Morales. An evolutionary approach to the study of non-trivial col-
lective behavior in cellular automata. In S Bandini, B Chopard, and M Tomassini,
editors, Cellular Automata, Proceedings, volume 2493 of Lecture Notes in Computer
Science, pages 32–43. Springer-Verlag Berlin, 2002.
[61] A. B. Kahn. Topological sorting of large networks. Communications of the ACM,
5(11):558–562, 1962.
[62] Konstantin Kalgin. Comparative study of parallel algorithms for asynchronous cel-
lular automata simulation on different computer architectures. In Stefania Bandini,
Sara Manzoni, Hiroshi Umeo, and Giuseppe Vizzari, editors, Cellular Automata, vol-
ume 6350 of Lecture Notes in Computer Science, pages 399–408. Springer Berlin /
Heidelberg, 2010.
[63] A. R. Kansal, S. Torquato, Harsh G. R. IV, E. A. Chiocca, and T. S. Deisboeck.
Cellular automaton of idealized brain tumor growth dynamics. Biosystems, 55(1-3):119
– 127, 2000.
[64] Stuart Kauffman, Carsten Peterson, Bj¨orn Samuelsson, and Carl Troein. Random
boolean network models and the yeast transcriptional network. Proceedings of The
National Academy of Sciences, 100(25):14796–14799, DEC 9 2003.
[65] Stuart A. Kauffman. Metabolic stability and epigenesis in randomly constructed ge-
netic nets. Journal of Theoretical Biology, 22(3):437–467, 1969.
[66] Henry Kaufman, Alessandro Vespignani, Benoit B. Mandelbrot, and Lionel Woog.
Parallel diffusion-limited aggregation. Physical Review E, 52(5, Part B):5602–5609,
NOV 1995.
[67] W. Kunishima, A. Nishiyama, H. Tanaka, and T. Tokihiro. Differential equations
for creating complex cellular automaton patterns. Journal of the Physical Society of
Japan, 73:2033, 2004.
Dustin Lockhart Arendt Bibliography 117
[68] Marcelo Kuperman and Guillermo Abramson. Small world effect in an epidemiological
model. Physical Review Letters, 86(13):2909–2912, MAR 26 2001.
[69] Chris G. Langton. Computation at the edge of chaos - phase-transitions and emergent
computation. Physica D, 42(1-3):12–37, JUN 1990.
[70] Fangting Li, Tao Long, Ying Lu, Qi Ouyang, and Chao Tang. The yeast cell-cycle
network is robustly designed. Proceedings of the National Academy of Sciences of the
United States of America, 101(14):4781–4786, 2004.
[71] Hong Li and Linda Petzold. Efficient parallelization of the stochastic simulation algo-
rithm for chemically reacting systems on the graphics processing unit. International
Journal of High Performance Computing Applications, 24(2):107–116, MAY 2010.
[72] Thomas M. Liggett. Stochastic models for large interacting systems and related cor-
relation inequalities. Proceedings of the National Academy of Sciences of the United
States of America, 107(38):16413–16419, SEP 21 2010.
[73] Aristid Lindenmayer. Mathematical models for cellular interactions in development I.
Filaments with one-sided inputs. Journal of Theoretical Biology, 18(3):280–299, 1968.
[74] Tommaso Margolus, Norman Toffoli. Cellular automata machines: a new environment
for modeling. The MIT Press, 1987.
[75] Mitaxi Mehta and Sinha Sinha. Asynchronous updating of coupled maps leads to
synchronization. Chaos, 10(2):350–358, JUN 2000.
[76] R Milo, S Shen-Orr, S Itzkovitz, N Kashtan, D Chklovskii, and U Alon. Network
motifs: Simple building blocks of complex networks. Science, 298(5594):824–827, OCT
25 2002.
[77] Melanie Mitchell. An Introduction to Genetic Algorithms. The MIT press, 1998.
[78] Melanie Mitchell, James P. Crutchfield, and Rajarshi Das. Evolving cellular automata
with genetic algorithms: A review of recent work. In In Proceedings of the First In-
ternational Conference on Evolutionary Computation and its Applications (EvCA’96).
Russian Academy of Sciences, 1996.
Dustin Lockhart Arendt Bibliography 118
[79] Melanie Mitchell, James P. Crutchfield, and Peter T. Hraber. Dynamics, computation,
and the “edge of chaos”: A re-examination. In Santa Fe Institute Studies in the
Sciences of Complexity, volume 19, pages 497–497. Addison-Wesley Publishing Co,
1994.
[80] H. S. Mortveit and C. M. Reidys. Discrete, sequential dynamical systems. Discrete
Mathematics, 226(1-3):281–295, JAN 6 2001.
[81] Kai Nagel and Michael Schreckenberg. A cellular automaton model for freeway traffic.
Journal de Physique I, 2(12):2221–2229, 1992.
[82] Philippe Narbel. Qualitative and quantitative cellular automata from differential equa-
tions. In Samira El Yacoubi, Bastien Chopard, and Stefania Bandini, editors, Cellular
Automata, volume 4173 of Lecture Notes in Computer Science, pages 112–121. Springer
Berlin / Heidelberg, 2006.
[83] Chrystopher L. Nehaniv. Asynchronous automata networks can emulate any syn-
chronous automata network. International Journal of Algebra and Computation,
5(6):719–739, 2004.
[84] M. E. J. Newman and D. J. Watts. Scaling and percolation in the small-world network
model. Physical Review E, 60(6):7332, 1999.
[85] A. Nishiyama, H. Tanaka, and T. Tokihiro. An isotropic cellular automaton for ex-
citable media. Physica A: Statistical Mechanics and its Applications, 387(13):3129–
3136, 2008.
[86] Martin A. Nowak and Robert M. May. Evolutionary games and spatial chaos. Nature,
359(6398):826–829, 1992.
[87] John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Kr¨ueger,
Aaron E. Lefohn, and Timothy J. Purcell. A survey of general-purpose computation
on graphics hardware. Computer Graphics Forum, 26(1):80–113, 2007.
[88] Tobias Preis, Peter Virnau, Wolfgang Paul, and Johannes J. Schneider. GPU acceler-
ated Monte Carlo simulation of the 2D and 3D Ising model. Journal of Computational
Physics, 228(12):4468 – 4477, 2009.
Dustin Lockhart Arendt Bibliography 119
[89] Przemyslaw. Prusinkiewicz and Aristid Lindenmayer. The algorithmic beauty of plants
(The Virtual Laboratory). Springer, 1991.
[90] Daniel H. Rothman and Stephanie Zaleski. Lattice-Gas Cellular Automata. Cambridge
University Press, 1997.
[91] Thomas C. Schelling. Dynamic models of segregation. The Journal of Mathematical
Sociology, 1(2):143–186, 1971.
[92] Joel L. Schiff. Cellular Automata: a Discrete View of the World. Wiley-Interscience,
Hoboken, NJ, 2008.
[93] Birgitt Sch¨onfisch. Anisotropy in cellular automata. Biosystems, 41(1):29–41, 1997.
[94] Birgitt Sch¨onfisch and Andr´e de Roos. Synchronous and asynchronous updating in
cellular automata. Biosystems, 51(3):123–143, SEP 1999.
[95] Birgitt Sch¨onfisch and K. P. Hadeler. Dimer automata and cellular automata. Physica
D, 94(4):188–204, JUL 15 1996.
[96] Cosma R. Shalizi and Kristina L. Shalizi. Quantifying self-organization in cyclic cellular
automata. Arxiv preprint nlin/0507067, 2005.
[97] Cosma R. Shalizi, Kristina L. Shalizi, and Robert Haslinger. Quantifying self-
organization with optimal predictors. Physical Review Letters, 93(11):118701, 2004.
[98] O. Shanker. Defining dimension of a complex network. Modern Physics Letters B,
21(6):321–326, 2007.
[99] Alexander Slepoy, Aidan P. Thompson, and Steven J. Plimpton. A constant-time
kinetic Monte Carlo algorithm for simulation of large biochemical reaction networks.
Journal of Chemical Physics, 128(20), MAY 28 2008.
[100] Andr´e Stauffer and Moshe Sipper. On the relationship between cellular automata and
l-systems: The self-replication case. Physica D: Nonlinear Phenomena, 116(1-2):71–80,
1998.
[101] Steven H. Strogatz. Exploring complex networks. Nature, 410(6825):268–276, MAR 8
2001.
Dustin Lockhart Arendt Bibliography 120
[102] Tomoaki Suzudo. Searching for pattern-forming asynchronous cellular automata - an
evolutionary approach. In PMA Sloot, B Chopard, and AG Hoekstra, editors, Cellular
Automata, Proceedings, volume 3305 of Lecture Notes in Computer Science, pages
151–160. Springer-Verlag Berlin, 2004.
[103] Tomoaki Suzudo. Spatial pattern formation in asynchronous cellular automata with
mass conservation. Physica A, 343:185–200, NOV 15 2004.
[104] Syn Kiat Tan and Sheng-Uei Guan. Evolving cellular automata to generate nonlinear
sequences with desirable properties. Applied Soft Computing, 7(3):1131 – 1134, 2007.
[105] Dejan Vinkovi´c and Alan Kirman. A physical analogue of the Schelling model. Pro-
ceedings of the National Academy of Sciences, 103(51):19261–19265, 2006.
[106] Duncan J. Watts and Steven H. Strogatz. Collective dynamics of ‘small-world’ net-
works. Nature, 393(6684):440–442, JUN 4 1998.
[107] org .R. Weimar. Cellular automata for reaction-diffusion systems. Parallel Computing,
23(11):1699–1715, 1997.
[108] T. A. Witten and L. M. Sander. Diffusion-limited aggregation, a kinetic critical phe-
nomenon. Physical Review Letters, 47(19):1400–1403, 1981.
[109] Stephen Wolfram. Cellular automata as models of complexity. Nature, 311(5985):419–
424, 1984.
[110] Stephen Wolfram. A New Kind of Science. Wolfram Media, Inc., 2002.
[111] Dietmar Wolz and Pedro P. B. de Oliveira. Very effective evolutionary techniques for
searching cellular automata rule spaces. Journal of Cellular Automata, 3(4):289–312,
2008.
[112] Yukio and Gunji. Pigment color patterns of molluscs as an autonomous process gen-
erated by asynchronous automata. Biosystems, 23(4):317 – 334, 1990.
Article
Connected-component labelling remains an important and widely-used technique for processing and analysing images and other forms of data in various application areas. Different data sources produce components with different structural features and may be more or less suited to certain connected-component labelling algorithms. Although many efficient serial algorithms exist, determining connected-components on Graphical Processing Units (GPUs) is of interest as many applications use GPUs for processing other parts of the application and labelling on the GPU can avoid expensive memory transfers. The general problem of connected-component labelling is discussed and two existing GPU-based algorithms are discussed - label-equivalence and Komura-equivalence. A new GPU-based, parallel component-labelling algorithm is presented that identifies and eliminates redundant operations in the previous methods for rectilinear two- and three-dimensional datasets. A set of test-cases with a range of structural features and systems sizes is presented and used to evaluate the new labelling algorithm on modern NVIDIA GPU devices and compare it to existing algorithms. The results of the performance evaluation are presented and show that the new algorithm can provide a meaningful performance improvement over previous methods across a range of test cases.
Chapter
In this chapter we have presented mathematical modeling approaches to biological pattern formation. While reaction-diffusion models are appropriate to describe the spatio-temporal dynamics of (morphogenetic) signaling molecules or large cell populations, microscopic models at the cellular or subcellular level have to be chosen if one is interested in the dynamics of small populations. Interest in such “individual-based” approaches has recently grown significantly as more and more (genetic and proteomic) data are available. Important questions arise with respect to the mathematical analysis of microscopic individual-based models and the precise links to macroscopic approaches. While in physical processes typically the macroscopic equation is already known, the master equations in biological pattern formation are far from clear. In the remainder of the book we focus on cellular automata, which can be interpreted as an individual-based modeling approach.
Article
A cellular automaton (CA) is a discrete mathematical model used to calculate the global behavior of a complex system using (ideally) simple local rules. The space of interest is tessellated into a grid of cells, and the behavior of each cell is captured in state variables whose values at any instant are functions of a small neighborhood around the cell. The dynamic behavior of the system is modeled by the evolution of cell states that are computed repeatedly for all cells in discrete time steps. CA-based models are highly parallelizable as, in each time step, the new state is determined completely by the neighborhood state in the previous time step. When parallelized, these calculations are generally memory bound because the number of arithmetic operations performed per memory access is relatively small (i.e., the arithmetic intensity is low). This chapter presents a series of progressively sophisticated techniques for improving the memory access patterns in CA-based calculations, and hence the overall performance. It utilizes some common techniques encountered when developing CUDA code, including the use of shared memory, memory alignment, minimizing halo bytes, and ways to increase the computational intensity.
Chapter
Today, the principles of the self-organizing system are known with some completeness, in the sense that no major part of the subject is wholly mysterious. We have a secure base. Today we know extactly what we mean by "machine", by "organization", by "integration", and by "selforganization". We understand these concepts as thoroughly and as rigorously as the mathematician understands "continuity" or "convergence". In these terms we can see today that the artificial generation of dynamic systems with "life" and "intelligence" is not merely simple-it is unavoidable if only the basic requirements are met. These are not carbon, skater, or any other material entities but the persistence, over a long time, of the action of any operator that is both unchanging and single-valued. Every such operator forces the development of its own form of life and intelligence. But will the forms developed be of use to us? Here the situation is dominated by the basic law of requisite variety (and Shannon's Tenth Theorem), which says that the achieving of appropriate selection (to a degree better than chance) is absolutely dependent on the processing of at least that quantity of information. Future work must respect this law, or be marked as futile even before it has started. Finally, I commend as a program for research, the identification of the physical basis of the brain's memory stores. Our knowledge of the brain's functioning is today grossly out of balance. A vast amount is known about how the brain goes from state to state at about millisecond intervals; but when we consider our knowledge of the basis of the important long-term changes we find it to amount, practically, to nothing. I suggest it is time that we made some definite attempt to attack this problem. Surely it is time that the world had one team active in this direction?
Article
This particular three-dimensional random packing limit problem is to determine the mean fraction of a cubic space that would be occupied by aligned, fixed, equalsize cubes, placed at random locations sequentially until no more can be added. No analytical solution has yet been found for this problem. Simulation results for a finite region and finite number of attempts were extrapolated to an infinite number of attempts (N → ∞) in an infinite region by multiple linear regression, using volume fraction occupied (F) as a linear combination of the ratio of the length of the small cube sides (S) to the length of the cubic region side (L) and the cube root of the ratio of the region volume to the total volume of cubes tried, (L3 / NS3)1/3. These results for random packing in a volume with penetrable walls can be adjusted with a multiplicative correction factor to give the results for impenetrable walls. A total of N = 107 attempts at placement were made for L/S = 20/1 and N = 14 × 106 attempts were made for L/S = 10/1. The results for volume fraction packed are correlated by F = 0.430(± 0.008) + 0.966(± 0.072)(S / L) - 0.236 (± 0.029)(L3 / NS3)1/3. The numbers in parentheses are twice the standard errors of estimate of the coefficients, indicating the 95% confidence intervals due to random errors. This value for the packing density limit, 0.430 ± 0.008, is slightly larger than that given by a conjecture by Palásti [10], 0.4178. Our value is consistent with that obtained by rather different simulation methods by Jodrey and Tory [8], 0.4227 ± 0.0006, and by Blaisdell and Solomon [2], 0.4262.
Article
The purpose of this paper is to prove a conjecture made by Stephen Wolfram in 1985, that an elementary one dimensional cellular automaton known as "Rule 110" is capable of universal computation. I developed this proof of his conjecture while assisting Stephen Wolfram on research for A New Kind of Science [1].
Article
An accessible and multidisciplinary introduction to cellular automata. As the applicability of cellular automata broadens and technology advances, there is a need for a concise, yet thorough, resource that lays the foundation of key cellularautomata rules and applications. In recent years, Stephen Wolfram's A New Kind of Science has brought the modeling power that lies in cellular automata to the attentionof the scientific world, and now, Cellular Automata: A Discrete View of the World presents all the depth, analysis, and applicability of the classic Wolfram text in a straightforward, introductory manner. This book offers an introduction to cellular automata as a constructive method for modeling complex systems where patterns of self-organization arising from simple rules are revealed in phenomena that exist across a wide array of subject areas, including mathematics, physics, economics, and the social sciences. The book begins with a preliminary introduction to cellular automata, including a brief history of the topic along with coverage of sub-topics such as randomness, dimension, information, entropy, and fractals. The author then provides a complete discussion of dynamical systems and chaos due to their close connection with cellular automata and includes chapters that focus exclusively on one- and two-dimensional cellular automata. The next and most fascinating area of discussion is the application of these types of cellular automata in order to understand the complex behavior that occurs in natural phenomena. Finally, the continually evolving topic of complexity is discussed with a focus on how to properly define, identify, and marvel at its manifestations in various environments. The author's focus on the most important principles of cellular automata, combined with his ability to present complex material in an easy-to-follow style, makes this book a very approachable and inclusive source for understanding the concepts and applications of cellular automata. The highly visual nature of the subject is accented with over 200 illustrations, including an eight-page color insert, which provide vivid representations of the cellular automata under discussion. Readers also have the opportunity to follow and understand the models depicted throughout the text and create their own cellular automata using Java applets and simple computer code, which are available via the book's FTP site. This book serves as a valuable resource for undergraduate and graduate students in the physical, biological, and social sciences and may also be of interest to any reader with a scientific or basic mathematical background.