Page 1
http://naosite.lb.nagasakiu.ac.jp/
NAOSITE: Nagasaki University's Academic Output SITENAOSITE: Nagasaki University's Academic Output SITE
Title
FPGA implementation of a datadriven stochastic biochemical
simulator with the next reaction method
Yoshimi, Masato; Iwaoka, Yow; Nishikawa, Yuri; Kojima,
Toshinori; Osana, Yasunori; Funahashi, Akira; Hiroi, Noriko;
Shibata, Yuichiro; Iwanaga, Naoki; Yamada, Hideki; Kitano,
Hiroaki; Amano, Hideharu
2007 International Conference on Field Programmable Logic
and Applications, pp.254259
Author(s)
Citation
Issue Date200708
URL
http://hdl.handle.net/10069/19233
Description
2007 International Conference on Field Programmable Logic and
Applications : Amsterdam, Netherlands, 2007.08.272007.08.29
(c)2007 IEEE. Personal use of this material is permitted. However,
permission to reprint/republish this material for advertising or promotional
purposes or for creating new collective works for resale or redistribution to
servers or lists, or to reuse any copyrighted component of this work in other
works must be obtained from the IEEE.
Rights
Versionpublisher
Page 2
FPGA IMPLEMENTATION OF A DATADRIVEN STOCHASTIC BIOCHEMICAL
SIMULATOR WITH THE NEXT REACTION METHOD
Masato Yoshimi, Yow Iwaoka, Yuri Nishikawa
Toshinori Kojima, Yasunori Osana
Keio University
Yokohama, Japan
email: bio@am.ics.keio.ac.jp
Akira Funahashi, Noriko Hiroi
Kitano Symbiotic Systems Project,
ERATOSORST, JST
Tokyo, Japan
Yuichiro Shibata, Naoki Iwanaga,
Hideki Yamada
Nagasaki University
Nagasaki, Japan
Hiroaki Kitano
Kitano Symbiotic Systems Project,
ERATOSORST, JST
Tokyo, Japan
Hideharu Amano
Keio University
Yokohama, Japan
ABSTRACT
This paper introduces a scalable FPGA implementation of a
stochastic simulation algorithm (SSA) called the Next Re
action Method. There are some hardware approaches of
SSAs that obtained highthroughput on reconfigurable de
vices such as FPGAs, but these works lacked in scalability.
The design of this work can accommodate to the increasing
size oftarget biochemical models, or to makeuse ofincreas
ing capacity of FPGAs. Interconnection network between
arithmetic circuits and multiple simulation circuits aims to
perform a datadriven multithreading simulation. Approx
imately 8 times speedup was obtained compared to an exe
cution on Xeon 2.80GHz.
1. INTRODUCTION
The emergence of an academic field called a systems bi
ology has brought out a new challenge for both computer
scientists and biologists, which is to simulate cellular sys
tems using computational resources. Refinements of sim
ulation algorithms and modeling techniques therefore have
been continuous attempts in this field.
A stochastic biochemical simulationalgorithm(SSA)[1]
is a variation of the MonteCarlo methods, known for its
large number of calculation involved. Consequently, the
stochasticapproachcouldonlybeappliedtoextremelysmall
testing models, and it was difficult to simulate large scale
models with complex reaction systems.
Asignificantperformanceenhancementofgeneralpurpose
microprocessors since late in the ’90s were one of the main
causes for the stochastic approach to be reappraised as a fea
sible method to simulate largescale biochemical models,
and many refinements were made to the simulation algo
rithms. Following these trends, some hardware approaches
of stochastic simulators on reconfigurable devices began to
appear around 2004[2][3]. These works suggest tangibility
to achieve 10 to 100 times performance improvement com
pared to running stochastic simulation on generalpurpose
microprocessors, while requiring much lower development
cost than dedicated hardware.
Recent biochemical simulation softwares adopt an algo
rithm called the Next Reaction Method(NRM). It is widely
known as an SSA with the finest scalability to the size of
target models[4]. This work addresses an FPGA implemen
tation of the NRM, which has not yet been reported.
In this paper, we propose a new framework of imple
menting NRM by connecting several calculation units with
an interconnection network. This design aims to maintain a
scalability towards size of biochemical models, while pos
sessing flexibility to each target models and circuit sizes. As
a prototype implementation, we designed the interconnec
tion using multiplexers, and investigated performance and
scalability of the design.
2. STOCHASTIC BIOCHEMICAL SIMULATION
2.1. Stochastic Simulation Algorithm (SSA)
Gillespie proposed stochastic modelingtechniques of chem
ically reacting systems, the aim of which is to obtain “state”
variations of a model from moment to moment[1]. A bio
chemical model is defined as a list of reactions, and the
model’s state as a quantity of each species that appears in
thesereactions. Thus,stochasticsimulationalgorithm(SSA)
is a method to calculate the quantity from time to time. An
example of a biochemical model which has N reactions is
1424410606/07/$25.00 ©2007 IEEE.
254
Page 3
defined as in (1a)(1c).
R1:
S1+ S2
k1
− → S3
k2
− → S4
(1a)
R2:
S2+ S3
(1b)
...
RN:
SN+ S1
kN
− − → S2
(1c)
S1,S2,··· inequationsaboverepresentchemicalspecies,
whose numbers are integers. Species in the lefthand side
are called “reactants”, and ones in the righthand side are
“products”. Each reactant in reaction Rj has event prob
ability kjto bring about a chemical reaction. After initial
numbers of each species are given, it is ready to execute a
whole process of one computational cycle, which is to ob
tain the time change of the model. However, because the
algorithm being a variation of the Monte Carlo method, it
requires many trials of computational cycles to obtain accu
rate results.
Since the Gillespie’s First Reaction Method (FRM) and
the Direct Method (DM) had been proposed[1], several im
proved versions of SSA were presented[4][5]. One notable
proposal was made in 2000 by Gibson and Bruck, who pre
sented a new algorithm called the Next Reaction Method
(NRM). It reduced the time complexity from O(N) of the
original versionto O(log(N)), while mathematically “prov
ingthestatistical equivalenceofthesimulationresults”. NRM
is applied to representative software biochemical simulators
such as ECell3[6]. The detail of the algorithms are de
scribed in the next section.
2.1.1. First Reaction Method
The idea of SSA is to obtain the statechange of the model
through a repetition of a process to select a reaction that is
“most likely to occur in the next reaction cycle”.
In this algorithm, the following steps are included in one
reaction cycle. First, the chosen reaction of FRM has the
smallest τj, predicted time of occurrence among all reac
tions in the system[1]. Then, a predicted time of occurrence
for each reaction (2) is obtained by (3).
− →τ = (τ1,··· ,τN)
τj= ln(1/r)/aj
(2)
(3)
Value r is a uniform random number between 0 and 1. ajis
called “a propensity”, which is a multiple of an event prob
ability kjand a combination number of all the reactants in
Rj.
2.1.2. Next Reaction Method
NRMobtains− →τ accordingto(3)justlikeFRM,buttwonew
ideas are introduced to reduce time complexity[4]: Indexed
Priority Queue(IPQ) and Dependence Graph(DG). The cal
culation steps in one reaction cycle are as follows. Once− →τ
is calculated, a set ofvalueτjandits reactionID j are stored
in a heap tree called an IPQ. With this modification, NRM
only requires a calculation of τμ for reaction Rμthat oc
curred in one reaction cycle without recalculation of whole
− →τ . The root node of IPQ points to the next reaction and its
time of occurrence. Afterward, (4) modifies several values
in− →τ .
τj,new= aj,old/aj,new(τj,old− τμ) + τμ
(4)
Meanwhile,eachbiochemicalreactionmaybedependamong
each other; a change in quantity of certain species due to
one biochemical reaction may affect the others. Thus, all
predicted time of reactions τj that are influenced by cur
rent occurrence needs to be modified. In order to clarify
these causal relationships, NRM uses a list called a DG. For
each reaction, DG enumerates other reactions whose related
species’ number would be modified. For instance, DG of
R1(1a) is given as (5).
DG(R1) = {R2,R3,RN}
(5)
Finally, the heap tree is updated to maintain its order. The
update is necessary whenever value τj has been changed
with (3) or (4).
Table 1 shows a comparison of operational processes
in FRM and NRM in a reaction cycle. In NRM, Gibson
and Bruck also introduced a barometer D for representing
a complexity of a biochemical model. Assume a value dj
that represents a dependency of reaction Rj, which is a to
tal number of reactions that will be updated due to the oc
curence of Rj. This is equal to a number of executing (3)
or (4) when Rjoccurs. The barometer is an average num
ber D(? N) of dj in the model. Number of calculation
increases according to the model size for FRM, while NRM
is proportional to log(N). This table indicates higher scala
bility of NRM compared to FRM.
In this paper,the NRM circuit designwas evaluatedwith
a model D4S(D = 4 System) defined in (1a)(1c) by chang
ing its N.
Table 1. Number of calculations and time complexity of
FRM and NRM
FRM
Eq. 3
Eq. 4
updating IPQ
Order
O(N)
NRM
N
1
0
0
dμ− 1
dμ
O(log(N))
255
Page 4
Fig. 1. Dropoff of the throughput versus model size
2.2. Stochastic Biochemical Simulators on FPGA
2.2.1. Related work
Several challenges have been made to design a stochastic
biochemical simulator on FPGA since 2004. Keane et al.
and Salwinski et al. both successfully achieved approxi
mately 20 times speedup compared to generalpurpose mi
croprocessors[2][7]. However,bothoftheirworksarebased
on more approximated version of Gillespie’s algorithm, and
calculation steps were also simplified. For example, they
convertfloatingpointtointegervaluestoperformhighspeed
computation. Thus, simulations on these platforms may re
quire many more computational cycles to obtain the same
level of accuracy with softwarebased simulators.
2.2.2. The FRM implementation on an FPGA
We have been implementing and evaluating stochastic bio
chemicalsimulatorssince2004[8][3]. In2006, acircuitwas
designed with two simulation threads that timeshare one
singleprecision floating point computational unit. Pipeline
of the computational unit can receive consecutive input data
to achieve high throughput [3]. And without using approx
imated stochastic algorithm, this implementation achieved
morethan 80 times speedup compared to execution onXeon
2.80GHz by running six threads in parallel to simulate a
model with 1000 reactions (N = 1000).
Cao et al. has already compared computational time of
NRM andDM, which is knownto be morecomputationally
efficient than FRM[5]. To investigate more detail, we wrote
execution programs of FRM and NRM in C++, and evalu
ated throughput versus models size for both algorithms us
ing D4S. Here, we define a term ”throughput” as an exe
cution time of one reaction cycle. The results are shown
in Fig.1, together with the FPGA execution result of FRM.
According to these results, throughput degradation of the
FPGAimplementationofFRMismoreprominentthanNRM
execution on Xeon as the model size increases. Advantage
of the two turns back at a point of N = 425, and NRM
move out ahead by about three times at N = 1000. This
implies that calculation cost of FRM on an FPGA is purely
First Reaction Method
Next Reaction Method
UPDT
REAT
PROC
TCAL
Reaction Cycle Reaction Cycle
FU
DU1
DU2
DU1
DU2
DGTB
TMOD
Reaction Cycle(DU2)
Reaction Cycle(DU2)
Reaction Cycle(DU1)
Reaction Cycle(DU1)
Fig. 2. Usage of calculation units for FRM and NRM
disadvantageous compared NRM on microprocessors, con
sidering that NRM is proved to produce equivalent results
with FRM.
Discussions above suggested the possibility of FPGA
implementation of NRM to achieve higher throughput for
stochastic biochemical simulations. Operating frequency of
FPGAs are generally dozen times lower than that of micro
processors, therefore it is difficult to obtain high throughput
with a single task execution without degrading numerical
precision. Consequently, we followed the policy of running
multiple threads as in our FRM implementation, by arrang
ing multiple thread execution circuits with small number of
computational units. In addition to achieve high through
put, we aim to investigate a scalable design toward size of a
target model, or number of threads running in parallel.
3. DESIGN CONCEPT
A schematic figure of calculation steps in FRM and NRM
per a reaction cycle is illustrated in Fig.2. Unlike FRM that
calculates− →τ by repetitively accessing the same calculation
for N times, NRM performs several arithmetic operations
and update of heap trees according to occurrences of reac
tions. To carry out a seamless operation, a circuit was de
signed to perform a datadriven multithreading simulation
unlike previous works with statically scheduled dataflow.
Calculation units for NRM generally occupy a large cir
cuit area due to having logarithmic arithmetic units. How
ever, number of logarithmic calculation in one cycle is very
small. Thus, it is effective to share the unit among multiple
simulation threads in terms of area efficiency and through
put.
Consequently, the modules are divided into two groups.
The first group of modules need to be prepared for each
simulation thread, while the other group is shared among
the multiple threads. These would be connected via some
communication networks. A schematic figure of calculation
256
Page 5
State Update
Table
UPDT
Uμ
Update State
Reactants
Table
REAT
Calculate Newτμ
Propensity
Calculator
PRPC
αμ
Calculator
TCAL τμ
τ
Dependency
Graph
DGTB DGList
Modificator
TMOD
τ
τj
Initialization
Calculate all τ & store IPQ
(update)
(update)
(read)
Get DG
Modify τs
Circulate of times
according to DGList (μ)
Thread Share
(Thread Independent)
Thread Private
(Thread Dependent)
IPQ
(read)
Propencsity
Table
State Table
Indexed Priority
Queue
(update)(update)
Data
Unit
(update)
(update)
(read)
(update)
(read)
Fig. 3. Computational flow of the Next Reaction Method
UPDT
REAT PROC
TCAL
DGTB
TMOD
IPQ
State Table
DU1
IPQ
State Table
DU2
IPQ
State Table
DU3
IPQ
State Table
DU4
Fig. 4. Connection Diagram of the NRM circuit
steps persimulationcycleofNRM is shownin Fig.3. A sim
ulation cycle starts from an “Update State” stage, followed
by the next three stages. Each step should be proceeded se
quentially as shown in Fig.3. “modifyτ” stage is repeated
for dμ−1 times per cycle, so calculation steps differ among
reactions Rμthat occurred. Three blocks on the righthand
side of Fig.3 are arrays to store variables and intermediate
data of the simulation, so these blocks should be prepared
foreachsimulationthread. A set ofthe threeblocksis called
a “ThreadPrivate Unit (TPU)” in the following sections. On
the other hand, six blocks in the lefthand side of the figure
are units that perform calculation based on data of TPUs,
and retrieve the result. Since all simulation threads can use
the same calculation units, they are called “Thread Share
Units (TSUs)” in the following sections.
Fig.4 illustrates a configuration with four sets of TPUs
each of which is connected to TSU with a multiplexer. By
selecting an appropriate interconnection network and num
ber of TPUs, it becomes tangible to configure a circuit de
sign according to FPGA area and model size. This paper
evaluates performance and area with a prototype implemen
tation of the NRM circuit, followed by an investigation of
scalability through an FPGAimplementation of NRM cir
cuit with variable number of TPUs and TSU sets connected
with a multiplexer.
4. IMPLEMENTATION
4.1. TPU and TSU
A prototype of TPU and TSU was implemented based on
the design described in Section 3. All modules were written
in VerilogHDL, and synthesis, placement and routing were
done by Xilinx’s ISE8.2i.
Target device of the design is VirtexII Pro (XC2VP70
6) on a ReCSiP2 board[3], a biochemical simulation ori
ented platform. Singleprecision floatingpoint arithmetic
units were from Xilinx’s LogiCORE floatingpoint. As a
storageforvariables ineachunit,BlockRAMsontheVirtex
II Pro were utilized, whose entry size is 32bit ×1024 words.
Maximum number of biochemical reactions supported by
this implementation is 1024, which is sufficient for existing
stochastic models.
Table 2 is a rough estimate of area and operating fre
quency of each unit. Area of TCAL in TSU is large, be
cause it owns a logarithmic arithmetic unit in order to calcu
late (3). Random numbers required in the same equation are
generated with Msequence random number generator, and
logarithmic values are obtained with second order interpola
tion. PRPC and TMOD are calculation units for propensity
and τjmodification with (4). REAT, UPDT and DGTB are
tables for storing constant values: species IDs of reactants
in each reaction, state update vectors, and a Dependency
Graph. Components of a TSU have pipeline pitches whose
size are shown as ”flit” in Table 2.
A TPU has three arrays in BlockRAMs, as shown in
Fig.3. It also has controllers to communicate with each TSU
based on the algorithm of NRM. There are two controllers
in the TPU for external and internal use. The external one
handles data transfer between TSUs, and the internal one is
in charge of updating IPQ and reading data required in each
calculation. It should be noted that current implementation
does not perform a continuous data transfer, that is, once a
data sending request is accepted, the next request is not is
sued before retrieving the calculation result. This data trans
fer method will make latency longer, but it does not require
any output buffers so that the area is kept small.
Table 2. Resource utilization and operating frequency of
TPU and TSU
TPU
DUREATTCAL
TSU
PRPCUPDT&
DGTB
TMOD
Slices
BRAMs
Mult
Freq.
In. width
6400
2
0
38949610
3
0
169
8
0
62
8
4
0 13
132
64
139
64
122
10
132
74
115
10
126
64
flit1

1
1
111
1
2
Latency 211127
257
Page 6
0
20
40
60
80
100
120
140
0
5000
10000
15000
20000
25000
30000
35000
40000
1248 1624 30 32
Operating Frequency [MHz]
Number of Slices
Number of TLU
TLU
InterConnection
TSU+SR
Operating Freqency
XC2VP706(33080Slices)
112.50
111.32
113.41
109.11
88.18
77.32
45.81
7362
640
1664
5058
8173
1280
1801
5092
10075
2560
2356
5159
13067
5120
2651
5296
20907
10240
5101
5566
27851
15360
6652
5839
32016
19200
6774
6042
33452
20480
6863
6109
Fig. 5. Area and operating frequency
4.2. Interconnect
An interconnection network between TPUs and TSUs can
be categorized into two components: an input network and
anoutputnetworkto/fromtheTSU.Theinputnetworkespe
cially requires a mechanism to appropriately transfer send
ingrequests frommultipleTPUstothecorrespondingTSUs.
For this work, Tinput multiplexers (MUX) are adopted as
the interconnect, where T represents a number of TPUs.
Each TSU has its own MUX, data bandwidth and functions
of which are correspondent to the TSU.
Each MUX receives data transfer request (REQ) sig
nal and data from multiple TPUs, then returns acknowledge
(ACK) signal to a TPU, and outputs data to the output line.
Each MUX has a buffer to store one flit of input data. Thus,
when an MUX is connected to a TSU whose transfer data
size is more than two flits, an acknowledged TPU can dom
inate the MUX for the equal number of clock cycles with
the number of flits. Other TPUs in need to use the same
TSU wait until the end of current data transfer. Current im
plementation has six TSUs, the input network of which are
connected with six different MUXs. Table 2 shows data size
and number of flits for each MUX. This restriction mecha
nism of the input networks forms up the output data stream,
so there is no case when multiple TSUs try to send results to
one TPU. Thus, no arbitration mechanism is required in the
output networks.
AreaandlatencyofMUXsincreaseasT becomeslarger.
Their interrelationship will be discussed in Section 5.
5. EVALUATION
5.1. Area evaluation
Fig.5 shows an areaand maximum operating frequency with
different numbers of TPU (T). Current implementation can
accommodateupto30TPUsonXilinx’s XC2VP706. From
0
1
2
3
4
5
6
7
8
9
10
0
1
2
3
4
5
6
7
163264 128256 5121000
Gain [FPGA/Xeon]
Throughput [Mcycles/sec]
Model Size [N]
Xeon2.80GHz
T30 FPGA(45.81MHz)
T16 FPGA (77.32MHz)
1.15
6.05
6.29
6.22
T30 FPGA(45.81MHz)
T16 FPGA(77.32MHz)
1.03
5.98
0.93
5.91
6.15
0.86
5.84
6.07
0.79
5.76
5.98
0.73
5.68
5.89
0.70
5.65
5.88
6.05
5.82
6.64
6.39
7.09
6.83
7.55
7.27
8.05
7.76
8.42
8.10
5.45
5.24
Fig. 6. Throughput and its gain
the figure, number of used BlockRAMs increases by 8 as
number of T is incremented. Increasing rate of the Block
RAM (2.43%) is larger than that of the area (1.87%), but
this is not the main reason of limiting the maximum number
of T, since TSUs dominate many slices in the target FPGA.
Slices of TPUs increase linearly as T becomes large.
This causes the increase of slices for MUXs, but its rate is
gradual compared to that of TPUs since they own an output
buffer only for one flit data. However, fanout to the output
buffer of MUXs also increases, which degrades maximum
operating frequency.
Fig.5 also indicates that critical paths lies in a 64bit
integerfloat converting module when T ≤ 8, and in a TPU
TMOD MUX when T ≥ 16.
5.2. Performance evaluation
Fig.6 shows throughput measured through RTL simulations
for different model sizes between N = 16 and N = 1000
of D4S model (as defined in (1a)(1c)). Gain in throughput
and performance based on execution on Xeon 2.80GHz are
also evaluated.
Execution of 16 and 30 threads achieved approximately
5.2 to 8.4 times throughput compared to that of Xeon. The
result also indicates an advantage of the design for sim
ulating models with larger N. In case when D is com
mon and N differs, the difference of calculation time is only
caused by the update of a heap tree. Since branch penalties
on Xeon microprocessor is relatively large because it takes
much longer time for updating the heap tree, while the TPU
completes data reading, comparison and exchange in only
3 clock cycles. Critical path of simulation circuits when
T ≥ 16 lies in an MUX between TPU and TSU. Conse
quently, the throughput of T = 30 is below that of T = 16.
258
Page 7
220
225
230
235
240
245
D30
D16
1000 512 256 128 64
Model Size [N]
32 16
Clock cycle per reaction
Fig. 7. Average clock cycles per reaction cycle versus num
ber of TPU
Based on the analysis above, an improved interconnection
is expected to suppress the degradation of the operating fre
quency. One idea is to replace the multiplexers with a hier
archical bus structure.
Dataflowratewithintheinterconnectwouldsignificantly
affect the throughput in such structure, therefore we also
evaluated an average clock cycles per one reaction cycle for
different model sizes N in case of T = 16 and T = 30. The
result is shown in Fig.7. The figure tells that the increase
of the clock cycles were small and follows O(log(N)). In
case of D4S model, the number of calculation and update of
heap tree is same among all model sizes. Thus, Fig.7 corre
sponds to the difference in time required to update the heap
tree in case of the same T and different N, and difference
of waiting time to the multiplexer in case of the same N and
different T. Clock cycles are larger for T = 30 than T = 16
in anycase as small as 5 clockcycles. This is becausedata is
not frequently sent over the interconnection, and the length
is 2 flits at most. Thus, replacement of the interconnect from
multiplexers to the hierarchical bus structure would possi
bly minimize throughput degradation due to the increase of
waiting time in data transfer.
These evaluation results indicate scalability of the cur
rent NRM circuit design to the increasing model size, espe
cially in contrast with execution on Xeon processor. Fea
sibility of achieving higher throughput is also suggested by
modifying the structure of the interconnection network.
6. CONCLUSION AND FUTURE WORK
This paper described an FPGAbased design of a biochemi
cal simulation circuit for performing a stochastic simulation
based on the Next Reaction Method, and evaluated its pro
totype implementation. The circuit was designed to achieve
highthroughputbyallowingmultiplesimulationthreadsrun
ning in parallel. Every module in the circuit is categorized
into a group that should be prepared for each simulation
thread and a group that are shared among multiple threads.
Currently, their interconnection is designed with a multi
plexer, and approximately 5.2 to 8.4 times higher through
put was obtained compared to execution on Xeon 2.80 GHz.
Some investigation results are given to suggest a feasibility
of a higher throughput design by selecting appropriate inter
connection network.
As afuture work,weareplanningtoimprovethroughput
based on the current structure. Methodology of data transfer
will also be modified from current pingpong transmission
to a mechanism that tolerates continuous requests. Further
more, we will analyze utilization of each arithmetic unit and
datatransferratewithseveral biochemicalmodels, andcarry
out more investigation of a suitable design for the intercon
nection network with higher throughput and scalability.
ACKNOWLEDGMENT
This work is supported by VLSI Design and Education Cen
ter(VDEC), The University of Tokyo with the collaboration
with Cadence Corporation.
7. REFERENCES
[1] D. T. Gillespie, “A general method for numerically simulating
the stochastic time evolution of coupled chemical reactions,”
Journal of Computational Physics, vol. 22, pp. 403–434, 1976.
[2] J. F. Keane, C. Bradley, and C. Ebeling, “A compiled accel
erator for biological cell signaling simulations,” in The 12th
International Symposium on FieldProgrammable Gate Ar
rays(FPGA), Feb. 2004, pp. 233–241.
[3] M. Yoshimi, Y. Osana, Y. Iwaoka, Y. Nishikawa, T. Kojima,
A. Funahashi, N. Hiroi, Y. Shibata, N. Iwanaga, H. Kitano, and
H. Amano, “An FPGA Implementation of High Throughput
Stochastic Simulator for LargeScale Biochemical Systems,”
in The 16th International Conference on Field Programmable
Logic and Applications, Aug. 2006, pp. 227–232.
[4] M. A.Gibson and J. Bruck, “Efficient exact stochastic simula
tion of chemical systems with many species and many chan
nels,” Journal of Physical Chemistry A, vol. 104, no. 9, pp.
1876–1889, 2000.
[5] Y. Cao, H. Li, and L. Petzold, “Efficient formulation of the
stochastic simulation algorithm for chemically recting sys
tems,” Journal of Chemical Physics, vol. 121, no. 9, pp. 4059–
4067, 2004.
[6] K. Takahashi, K. Yugi, K. Hashimoto, Y. Yamada, C. J. F.
Pickett, and M. Tomita, “A multialgorithm, multitimescale
method for cell simulation,” Bioinformatics, vol. 20, no. 4, pp.
538–546, Mar. 2004.
[7] L. Salwinski and D. Eisenberg, “In silico simulation of biolog
ical network dynamics,” Nature Biotechnology, vol. 22, no. 8,
pp. 1017–1019, Aug. 2004.
[8] M. Yoshimi, Y. Osana, T. Fukushima, and H. Amano,
“Stochastic simulation for biochemical reactions on FPGA,”
in The 14th International Conference on Field Programmable
Logic and Applications, ser. Lecture Notes in Computer Sci
ence, vol. 3203. Springer, Aug. 2004, pp. 105–114.
259