PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

In many mechanistic medical, biological, physical and engineered spatiotemporal dynamic models the numerical solution of partial differential equations (PDEs) can make simulations impractically slow. Biological models require the simultaneous calculation of the spatial variation of concentration of dozens of diffusing chemical species. Machine learning surrogates, neural networks trained to provide approximate solutions to such complicated numerical problems, can often provide speed-ups of several orders of magnitude compared to direct calculation. PDE surrogates enable use of larger models than are possible with direct calculation and can make including such simulations in real-time or near-real time workflows practical. Creating a surrogate requires running the direct calculation tens of thousands of times to generate training data and then training the neural network, both of which are computationally expensive. We use a Convolutional Neural Network to approximate the stationary solution to the diffusion equation in the case of two equal-diameter, circular, constant-value sources located at random positions in a two-dimensional square domain with absorbing boundary conditions. To improve convergence during training, we apply a training approach that uses roll-back to reject stochastic changes to the network that increase the loss function. The trained neural network approximation is about 1e3 times faster than the direct calculation for individual replicas. Because different applications will have different criteria for acceptable approximation accuracy, we discuss a variety of loss functions and accuracy estimators that can help select the best network for a particular application.
Content may be subject to copyright.
DEEP LEARNING APPROACHES TO SURROGATES FOR SOLVING
THE DIFFUSION EQUATION FOR MECHANISTIC REAL-WORLD
SIMULATIONS
A PREPRINT
J. Quetzalcóatl Toledo-Marín
Biocomplexity Institute, Indiana University,
Bloomington, IN 47408, USA
j.toledo.mx@gmail.com
Geoffrey Fox
Digital Science Center,
Bloomington, IN 47408, USA
Luddy School of Informatics,
Computing and Engineering, IN 47408, USA
gcf@iu.edu
James Sluka
Biocomplexity Institute, Indiana University,
Bloomington, IN 47408, USA
jsluka@iu.edu
James A. Glazier
Biocomplexity Institute, Indiana University,
Bloomington, IN 47408, USA
jaglazier@gmail.com
February 11, 2021
ABS TRAC T
In many mechanistic medical, biological, physical and engineered spatiotemporal dynamic models
the numerical solution of partial differential equations (
PDEs
), especially for diffusion, fluid flow
and mechanical relaxation, can make simulations impractically slow. Biological models of tissues
and organs often require the simultaneous calculation of the spatial variation of concentration of
dozens of diffusing chemical species. One clinical example where rapid calculation of a diffusing
field is of use is the estimation of oxygen gradients in the retina, based on imaging of the retinal
vasculature, to guide surgical interventions in diabetic retinopathy. Since the quasi-steady-state
solutions required for fast-diffusing chemical species like oxygen are particularly computationally
costly, we consider the use of a neural network to provide an approximate solution to the steady-state
diffusion equation. Machine learning
surrogates
, neural networks trained to provide approximate
solutions to such complicated numerical problems, can often provide speed-ups of several orders of
magnitude compared to direct calculation. Surrogates of PDEs could enable use of larger and more
detailed models than are possible with direct calculation and can make including such simulations
in real-time or near-real time workflows practical. Creating a surrogate requires running the direct
calculation tens of thousands of times to generate training data and then training the neural network,
both of which are computationally expensive. Often the practical applications of such models require
thousands to millions of replica simulations, for example for parameter identification and uncertainty
quantification, each of which gains speed from surrogate use and rapidly recovers the up-front costs
of surrogate generation. We use a Convolutional Neural Network to approximate the stationary
solution to the diffusion equation in the case of two equal-diameter, circular, constant-value sources
located at random positions in a two-dimensional square domain with absorbing boundary conditions.
Such a configuration caricatures the chemical concentration field of a fast-diffusing species like
oxygen in a tissue with two parallel blood vessels in a cross section perpendicular to the two blood
vessels. To improve convergence during training, we apply a training approach that uses roll-back to
reject stochastic changes to the network that increase the loss function. The trained neural network
approximation is about 1000 times faster than the direct calculation for individual replicas. Because
different applications will have different criteria for acceptable approximation accuracy, we discuss a
variety of loss functions and accuracy estimators that can help select the best network for a particular
arXiv:2102.05527v1 [cond-mat.soft] 10 Feb 2021
APREPRINT - FE BRUARY 11, 2021
application. We briefly discuss some of the issues we encountered with overfitting, mismapping of
the field values and the geometrical conditions that lead to large absolute and relative errors in the
approximate solution.
Keywords Diffusion surrogate ·Machine Learning ·Virtual tissue
1 Introduction
Diffusion is ubiquitous in physical, biological and engineered systems. In mechanistic computer simulations of the
dynamics of such systems, solving the steady state and time-varying diffusion equations with multiple sources and
sinks is often the most computationally expensive part of the calculation, especially in cases with multiple diffusing
species with diffusion constants differing by multiple orders of magnitude. Examples in biology include cells secreting
and responding to diffusible chemical signals during embryonic development, blood vessels secreting oxygen which
cells in tissues absorb during normal tissue function, tumors secreting growth factors promoting neoangiogenesis in
cancer progression, or viruses spreading from their host cells to infect other cells in tissues. In these situations the
natural diffusion constants can range from
103µm2/s
for oxygen to
0.1102µm2/s
for a typical protein [1].
Dynamic simulations of biological tissues and organs may require the independent calculation of the time-varying
concentrations of dozens of chemical species in three dimensions, and in the presence of a complex field of cells and
extracellular matrix. As the number of species increases, solving these diffusion equations dominates the computational
cost of the simulation. Numerous approaches attempt to reduce the cost of solving the diffusion equation including
implicit, particle-based, frequency-domain and finite-element methods, multithreaded and MPI-based parallelization
and GPUs, but all have significant limitations. In real-world problems, the number of sources and sinks, their shape,
boundary fluxes and positions differ from instance to instance and may change in time. Boundary conditions may also
be complicated and diffusion constants may be anisotropic or vary in space. The resulting lack of symmetry means that
many high-speed implicit and frequency-domain diffusion-solver approaches do not work effectively, requiring the
use of simpler but slower forward solvers [2]. Deep learning
1
surrogates to solve either the steady-state field or the
time-dependent field for a given set of sources and sinks subject to diffusion could potentially increase the speed of
such simulations by several orders of magnitude compared to the use of direct numerical solvers.
One challenge in developing effective deep neural network (NN) diffusion-solver surrogates is that the dimensionality of
the problem specification is potentially very high, with an arbitrary pattern of sources and sinks, with different boundary
conditions for each source and sink, and spatially variable or anisotropic diffusivities. As a proof-of-principle we will
start with a NN surrogate for a simple version of the problem that we can gradually generalize to a full surrogate in
future work. In a two-dimensional square domain represented as
N×Npixels
and with absorbing boundary conditions,
we place two circular sources of equal diameters at random positions, with the constraint that the sources do not overlap
and are fully contained within the domain. Each source imposes a constant value on the diffusing field within the
source and at its boundary. We select the value for one of the sources equal to 1 while the value for the other source is
randomly selected from a uniform distribution between
(0,1]
(see Fig. 1(a)). Outside the sources the field diffuses
with a constant diffusion constant (
D
) and linearly decays with a constant decay rate (
γ
). This simple geometry could
represent the diffusion and uptake of oxygen in a volume of tissue between two parallel blood vessels of different
diameters. Although reflecting or periodic boundary conditions might better represent a potion of a larger tissue, we use
the simpler absorbing boundary conditions here. In this case, the steady-state field depends critically on the distance
between the sources, and between the sources and the boundary, both relative to the diffusion length (
lD= (D/γ)1/2
)
and on the sources’ field strengths.
In practice then, the solution of the steady state diffusion equation maps an image consisting of
N×N
pixels with 0
value outside the sources and constant values between 0 and 1 inside the sources to a second image of the same size,
which has the same values inside the sources but values between 0 and 1 elsewhere (see Fig. 1(b)). We evaluate the
ability of a NN trained on the explicit numerical solutions of the steady-state diffusion field for
20,000
two-source
examples to approximate the steady state field for configurations of sources that it had not previously encountered.
Notice that the diffusion kernel convolution used in the direct solution of the time-dependent diffusion equation (e.g.,
finite-element methods) is a type of convolutional neural network [2]. Therefore we chose deep convolutional NN as
the architecture. However, there are multiple types of convolutional NN. Here we considered two of these. A deep
convolutional neural network and an autoencoder [3]. In addition, because it was possible that these two types would do
better at replicating specific aspects of the overall solution, we also evaluated a superposition of the two. Time series
surrogates often use recurrent NN [4, 5]. Similarly, deep generative models have been shown to be useful to sample
1
We use the terms deep learning and machine learning interchangeably. We also use neural network and deep neural network
interchangeably.
2
APREPRINT - FE BRUARY 11, 2021
(a) Initial condition (b) Stationary solution
Figure 1: Snapshot of
a)
initial condition and
b)
stationary state solution.
a)
We placed two random value sources of
radius
5voxels
in random positions fully within a
100 ×100pixel
lattice and used this configuration as the input to
the NN.
b)
Stationary solution to the diffusion equation with absorbing boundary conditions for the initial conditions in
a)
. The stationary solution
b)
is the target for the NN. We fixed the diffusion constant to
D= 1 voxels2/s
and the
decay rate to γ= 1/400s1, which yields a diffusion length equal to pD/γ voxels = 20voxels.
from high dimensional space, as in the case of molecular dynamics and chemical reaction modeling [6
11]. Since our
main interest is the stationary solution, we did not consider these approaches.
2 Model
Fig. 3 shows our NN architecture. We denote by
|xi
and
|ˆyi
the input and output images, that is the initial condition
layout of the source cells and the predicted stationary solution of the diffusion equation, respectively. The input
|xi
passes to two different neural networks (NNs) denoted NN 1 (Fig. 3(a)) and NN 2 (Fig. 3(b)) which output
|ˆy1i
and
|ˆy2i
, respectively. The output
|ˆyi
is a weighted sum of the outputs of the two NNs,
|ˆyi=p1|ˆy1i+p2|ˆy2i
, where
p1
and
p2
are fixed hyperparameters, i.e., these hyperparameters are fixed during training. In our code [12]
pi
are real
numbers, however, in this paper we only consider the Boolean case where they each take values of 0 or 1. NN 1 is a deep
convolutional neural network that maintains the height and width of the input image through each of 6 convolutional
layers. The first layer outputs a 4-channel image, the second layer outputs an 8-channel image, the third layer outputs a
16-channel image, the fourth layer outputs an 8-channel image, the fifth layer outputs a 4-channel image and the sixth
layer outputs a 1-channel image. NN 2 is an autoencoder [13] where the first 6 layers perform a meanpool operation
that reduces height and width in half after each layer following the sequence
{1002,502,252,122,62,32,12}
while
adding channels after each layer following the sequence
{1,64,128,256,512,1024,2048}
. Then, the following 6
layers consist on reducing the number of channels following the sequence
{1024,512,256,128,64,1}
while increasing
the height and width following the sequence
{12,32,72,132,252,512,1002}
. Fig. 3 sketches the architectures of the
two NNs, while Table 1 provides their parameters. We will find that NN 1 will capture the sources whereas NN 2 will
capture the field. In Table 1 we specify each neural network by specifying for each layer the kind of layer, the activation
function and the output shape.
To generate representative two-source initial conditions and paired steady-state diffusion fields, we considered a
two-dimensional lattice of size
100 ×100units2
. We generated 20k configurations with two sources, each with a
radius of
5units
. One source has a constant source value equal to
1
, while the other source has a constant source value
between 0 and 1 randomly assigned using a uniform distribution. Everywhere else the field value is 0. We placed the
sources in randomly uniform positions in the lattice. This image served as the input for the NN
|xi
. Then we calculated
the stationary solution to the diffusion equation with absorbing boundary conditions for each initial condition using
3
APREPRINT - FE BRUARY 11, 2021
Figure 2: Network Architecture: The input image
|xi
passes through NN 1 ( see Fig. 3(a)) and NN 2 (see 3(b)),
generating the two outputs
ˆy1i
and
|ˆy2i
The final output
|ˆyi
is the sum of the outputs of the two NNs weighted by
coefficients
p1
and
p2
,i.e.,
|ˆyi=p1|ˆy1i+p2|ˆy2i
.
pi
are fixed Boolean hyperparameters for the model and fixed for
each model we trained. This means that when a given model has pi= 0 (pi= 1) then NNi is turned off (on).
the Differential Equation package in Julia [15]. The Julia-calculated stationary solution is the target or ground truth
image for the NN
|yi
. In Figs. 1(a) and 1(b) we show an initial condition and the stationary solution, respectively. We
have set the diffusion constant to
D= 1units2/s
and the decay rate
γ= 1/400s1
, which yield a diffusion length
lD=pD/γ = 20 units
. Notice that this length is 4 times the radius of the sources and 1/5 the lattice linear dimension.
As
γ
increases and as
D
decreases, this length decreases. As this length decreases, the field gradient also decreases [16].
The source code to generate the data and train the NN can be found in Ref. [12].
We trained the CNN setting the number of epochs to
800
using the deep learning library in Julia called Flux [17]. We
varied the dropout values between
0.0
and
0.6
in steps of
0.1
(see Table 2). We used ADAM as the optimizer [18].
Deciding on a loss function is a critical choice in the creation of the surrogate. The loss function determines the types
of error the surrogate’s approximation will make compared to the direct calculation and the acceptability of these errors
will depend on the specific application. The mean squared error (MSE) error is a standard choice. However, it is more
sensitive to larger absolute errors and therefore tolerates large relative errors at pixels with small values. A loss function
calculated on the log of the values would be equally sensitive to relative error no matter what the absolute value. In most
biological contexts we want to have a small absolute error for small values and a small relative error for large values.
We explored the use of both functions, MAE and MSE, as described in Table 2. We used
80%
and
20%
of the dataset
for training and test sets, respectively. We trained each model once. The highest and lowest values in the input and
output images are
1
and
0
, respectively. The former only occurs in sources and their vicinity. Given the configurations
of the sources, the fraction of pixels in the image with values near
1
is
2πR2/L22%
. Thus, pixels with small
values are much more common than pixels with large values, and because the loss function is an average over the field,
high field values tend to get washed out. To account for this unbalance between the frequency of occurrence of low
and high values, we introduced an exponential weight on the pixels in the loss function. We modulate this exponential
weight through a scalar hyperparameter w, for the field in the ith lattice position in the loss function as
L(α)
= exp((hi|1i−hi|yβi)/w)·(hi|ˆyβi−hi|yβi)α,(1)
where
α
is
1
or
2
for MAE or MSE, respectively and
β
tags the tuple in the data set (input and target). Here
h|i
denotes
the inner product and
|ii
is a unitary vector with the same size as
|yβi
with all components equal to zero except the
element in position
i
which is equal to one.
|1i
is a vector with all components equal to 1 and with size equal to that of
|yβi
. Then
hi|yβi
is a scalar corresponding to the pixel value at the
i
th position in
|yβi
, whereas
hi|1i= 1
for all
i
.
Notice that high pixel values will then have an exponential weight
1
while low pixel values will have an exponential
4
APREPRINT - FE BRUARY 11, 2021
(a) NN 1
(b) NN 2
Figure 3: Sketch of
a)
Convolutional NN 1. The first layer takes as input a single-channel
N×N
image and applies
four
3×3
convolutions to generate four
N×N
images, the second layer applies eight
3×3
convolutions to generate
eight
N×N
images, the third layer applies sixteen
3×3
convolutions to generate sixteen
N×N
images, the fourth
layer applies eight
3×3
convolutions to generate eight
N×N
images, the fifth layer applies four
3×3
convolutions
to generate four
N×N
images and the sixth layer applies a
3×3
convolution to generate a single
N×N
image.
Sketch of
b)
autoencoder NN 2. The first 6 layers perform a meanpool operation that reduces image height and width
by half after each layer, with the image dimensions following the sequence
{1002,502,252,122,62,32,12}while
adding channels after each layer
following the sequence
{1,64,128,256,512,1024,2048}
. Then, the following 6
layers reverse the process, reducing the number of channels following the sequence
{1024,512,256,128,64,1}
while
increasing the height and width following the sequence
{12,32,72,132,252,512,1002}
. This sketch only defines the
kinds of layers used. For details about the activation functions used in each layer, see Table 1.
weight
exp(1/w)
. This implies that the error associated to high pixels will have a larger value than that for low
pixels. The loss function L(α)is the mean value over all pixels (i) and a given data set (β):
L(α)=hL(α)
i,(2)
where
hi
denotes average. In our initial trial training runs, we noticed that the loss function always reached a plateau by
800
epochs, so we trained the NNs over
800
epochs for all runs reported in this paper. Because the training is stochastic,
the loss function can increase as well as decrease between epochs as seen in Fig 4. At the end of 800 epochs we adopted
the network configuration with the lowest loss function regardless of the epoch at which it was achieved.
While the trendline (averaged over 5 or ten epochs) of the loss function value tends to decrease during training, the
stochasticity of the training means that the value of the loss function often increases significantly between successive
epochs, even by one or two orders of magnitude (see Fig. 4). In some cases, the loss function decreases back to its trend
after one or two epochs, in other cases (which we call
jumps
), it stays at the higher value, resetting the trend line to
the higher value and only gradually begins to decrease afterwards. In this case all of the epochs after the jump have
larger loss functions than the epoch immediately before the jump, as shown for the evolution of the loss function for
a typical training run in Fig. 4(a). This behavior indicates that the stochastic optimization algorithm has pursued an
unfavorable branch. To avoid this problem, we added a roll-back algorithm to the training, as proposed in Ref. [19].
5
APREPRINT - FE BRUARY 11, 2021
(a) NN 1
Operation Act Output shape
Conv 3 x 3 LReLU 4 x 100 x 100
Dropout 1 (D1) - -
BatchNorm Identity -
Conv 3 x 3 LReLU 8 x 100 x 100
BatchNorm Identity -
Conv 3 x 3 LReLU 16 x 100 x 100
BatchNorm Identity -
Conv 3 x 3 LReLU 8 x 100 x 100
BatchNorm Identity -
Conv 3 x 3 LReLU 4 x 100 x 100
BatchNorm Identity -
Conv 3 x 3 ReLU 1 x 100 x 100
Dropout 2 (D2) - -
BatchNorm Identity -
(b) NN 2
Operation Act Output shape
Conv 3 x 3 LReLU 64 x 100 x 100
BatchNorm Identity -
Dropout 3 (D3) - -
Meanpool Identity 64 x 50 x 50
Conv 3 x 3 LReLU 128 x 50 x 50
Meanpool Identity 128 x 25 x 25
Conv 3 x 3 LReLU 256 x 25 x 25
Meanpool Identity 256 x 12 x 12
Conv 3 x 3 LReLU 512 x 12 x 12
Meanpool Identity 512 x 6 x 6
Conv 3 x 3 LReLU 1024 x 6 x 6
Meanpool Identity 1024 x 3 x 3
Conv 3 x 3 LReLU 2048 x 1 x 1
ConvT 3 x 3 LReLU 1024 x 3 x 3
ConvT 3 x 3 LReLU 512 x 7 x 7
ConvT 3 x 3 LReLU 256 x 13 x 13
ConvT 3 x 3 LReLU 128 x 25 x 25
ConvT 3 x 3 LReLU 64 x 51 x 51
Dropout 4 (D4) - -
ConvT 4 x 4 ReLU 1 x 100 x 100
BatchNorm Identity -
Table 1: Convolutional Neural Network architectures. Left panel corresponds to the succesive operations of NN 1
while the right panel corresponds to the succesive operations NN 2. Act stands for activation function. Conv, ConvT
and (L)ReLU stand for convolution, convolution transpose, and (leaky) rectified linear unit, while Identity means the
activation function is the identity function (see Ref. [14]). Both NNs take as input the initial condition which has
dimensions Channels ×W idth ×Height = 1 ×100 ×100
We set a loss threshold value,
Lthrs
, such that if the ratio of loss value from epoch
n
to
n+ 1
is larger than
Lthrs
,
then the training algorithm reverts (
rolls back
) to the NN state corresponding to epoch
ns
and tries again. The
stochasticity of training means that roll-back has an effect similar to training an ensemble of models with the same
hyperparameters and selecting the model with the lowest loss function value, however, the roll-back optimization takes
much less computer time than a large ensemble. We set s= 5 and set the threshold value Lthrs to
Lthrs =C1
m
n
X
ep=nm+1
L(α)(ep).(3)
Here we chose
C= 5
and
m= 20
where
ep
stands for epoch, i.e., we set the threshold value to 5 times the average
loss function value over the previous
m= 20
epochs. We chose these values empirically. In Fig. 4(b) we have plotted a
typical example of the evolution of the loss function during training when we train using roll-back. A typical number of
roll-backs is
40
, i.e., this number is the number of epochs where the jump was higher than the threshold during the
training.
3 Results
Quite commonly, the mean residual is the estimator used to judge the goodness of a given model. However, there
are cases where the worst predictions are highly informative and can be used to make basic decisions about which
features of the NN do not add value. In Figs. 5(a), 5(b) and 5(c) we show 20 different inputs, targets and predictions,
6
APREPRINT - FE BRUARY 11, 2021
(a) Training Loss without roll-back (b) Training Loss with roll-back
Figure 4: Training loss function vs epochs for model 9 (the hyperparameters are specified in Table 2 and the NN details
are described in the main text) without roll-back
a)
and with roll-back
b)
using the same seed. We have circled in green
where a jump occurred during this training run (see main text for discussion).
respectively. The predictions in Fig. 5(c) were obtained using model 12 (see Table 2) and qualitatively show very good
results. For each model we computed the residual, i.e., the absolute value of the difference between the ground truth
and the NN prediction pixel-by-pixel, as shown in Fig. 6(b). We also analyzed the relative residual, i.e., the residual
divided by the ground truth pixel-by-pixel, as shown in Fig 6(c). Models 6 and 7, which only use NN 1 (
p1= 1
and
p2= 0
), yield mean residuals an order of magnitude larger than models that use both or only NN 2. Therefore, we
reject the NN 1-only models and do not analyze them further.
Table 2 summarizes the hyperparameter values for each model we trained. The choice of these parameters was
empirically driven. Since we had the field values bounded between
0
and
1
similar to black and white images, we tested
different
L
-norms, namely, mean absolute value (MAE), mean squared value (MSE) and mean to the fourth power,
often used in neural networks applied to images. In this paper we show the results for MAE and MSE. We also tested
different hyperparameters values for the dropout. We found that low dropout values for NN 2 yield the best results.
In Fig. 6(d) we have plotted the mean residual value, the 99-Percentile residual value and the maximum residual value
computed over the test set. Notice that the 99-Percentile residual value is ten times the mean residual value and the
maximum residual value is ten times the 99-Percentile residual value. This suggests that the residual distribution
contains outliers, i.e., there is a
1%
residual that deviate from mean residual 10 to 100 times. Furthermore, these outliers
correspond to regions between the source and the border, near the source, where the source is close to the border as
suggested by Fig. 6(b). While the largest values in absolute residual come from pixels near the source as shown in Fig.
6(b), the relative error near the source is small whereas the relative error near boundaries is large, as shown in Fig. 6(c).
Since we are considering absorbing boundary conditions, the field at the boundary is always equal to zero, thus strictly
speaking the relative residual value has a singularity at the boundary. Thus, at the boundaries there is a larger relative
error due to the boundary conditions.
Models 5, 11 and 12 have low mean residuals with model 5 being the smallest. Focusing instead on the mean residual
and the 99-Percentile, we notice that models 3, 4, 5, 11 and 12 yield the best results. Finally, considering the maximum
residual together with the previous estimators, we notice that model 9 has low mean residual, low 99-percentile residual
and the lowest max residual. Depending on the user’s needs, one estimator will be more relevant than others. In this
sense, defining a best model is relative. Nevertheless, having more metrics (e.g. relative error for large values and
absolute error for small values) helps to characterize each model’s performance. In future work we’ll consider more
adaptable metrics, as well as mixed error functions that incorporate multiple estimators.
Fig. 8 plots the prediction versus the target for each pixel in each image in the training and test sets for models 9 and 11.
Notice that for the test sets the results are qualitatively similar between models, for the training set the dispersion is
larger in model 11 than in model 9. This suggests model 11 is overfitting the training data. Models 9 and 11 have the
same hyperparameters except for the weight
w
. In the former
w= 100
while in the latter
w= 1
. This suggests that the
exponential weight helps reduce overfitting.
In Fig. 7 we show the prediction from NN 1 (Fig. 7(a)) and NN 2 (Fig. 7(b)). Notice that NN 1 is able to detect the
sources whereas NN 2 is able to predict the field. Using both neural networks improves the results as can be seen in Fig.
7
APREPRINT - FE BRUARY 11, 2021
(a) Input (b) Ground truth
(c) Prediction
Figure 5: Results for 20 randomly selected test data sets’
a)
input,
b)
ground truth (target output) and
c)
NN surrogate
prediction of steady-state diffusion field output for the input.
6(d). As previously mentioned, pixels with low (near 0) field values are much more common than pixels with high
(near 1) field values. While the exponential factor in the loss function compensates for this bias, the residual in Fig.
6(d) does not. To address this issue we compute the mean residual over small field intervals. This will tell us how well
the model predicts for each range of absolute values. Furthermore, this method can be used to emphasize accuracy or
relative accuracy in different value ranges. The way we do this is as follows. In Fig. 8 we take
10
slices of size
0.1
in
the direction
y=x
. We then compute the mean residual and standard deviation per slice. In section Supplement A we
have plotted the PDF (probability density function) per slice (blue bins) and a Gaussian distribution (red curve) with
mean and standard deviation set to the mean residual and standard deviation per slice, respectively. We did this for
all models in Table 2. In Fig. 9 we plotted the mean residual vs for each model for each slice for the test and training
sets. The error envelop shows the residual standard deviation per slice. Notice that models trained with MSE have a
smaller residual standard deviation than models trained with MAE in the case of the training set, which suggest that
MSE contributes to overfitting more that MAE. Recall that the difference between the MSE gradient and the MAE
gradient is that the former is linear with the residual value whereas the latter is a constant. Therefore, training with
MAE generalizes better than MSE. Additionally, notice the dispersion increases with the slice number.
In Fig. 10 we plotted the average and maximum over the residual mean value per slice (see Fig. 10(a)) and the residual
standard deviation per slice (see Fig. 10(b)) for each model’s test and training sets. Notice that in this approach, by
slicing the residual values and computing the average residual over the set of slices, we are giving equal weight to each
mean residual per slice and, therefore, compensating for the imbalance in frequency of low and high value pixels. An
interesting feature from using MSE or MAE comes from the PDF of the field values. Training using MAE makes the
PDF prediction quite accurate as the prediction completely overlaps with the ground truth (see Fig. 11). In comparison,
when training with MSE, the PDF is not as good and the overlap between ground truth and prediction is not complete.
There is a mismatch for low field values in the sense that the NN does not predict low non-zero field values correctly.
Thus we recommend using MAE to avoid this issue.
8
APREPRINT - FE BRUARY 11, 2021
(a) Ground truth (b) Snapshot of the Residual in a batch
(c) Snapshot of the Residual/true value in a batch (d) Mean value, 99-Percentile and max residual vs models
Figure 6:
a)
The stationary solution for the same batch in the test set.
b)
Residual (absolute error, i.e.,
||yβi−|ˆyβi|
) for
twenty sample source images in the test set trained using model 12 in Table 2.
c)
Residual/true value (relative error) for
the corresponding images.
d)
Mean, 99-Percentile and maximum residual for all of the models in Table 2. Left scale for
mean value, right scale for 99-Percentile residual value and right scale in parentheses for max residual value.
(a) Prediction from NN 1 (b) Prediction from NN 2
Figure 7: Results for 20 randomly selected test data sets
a)
Prediction using model 7, which only uses NN 1.
b)
Prediction using model 5, which only uses NN 2. See Table 2. Note the different scale on the color bars.
9
APREPRINT - FE BRUARY 11, 2021
(a) Model 9 Test (b) Model 9 Training
(c) Model 11 Test (d) Model 11 Training
Figure 8: Ground truth vs prediction for
a)
test set and
b)
training set in the case of model 9;
c)
test set and
d)
training
set in the case of model 11 (see Table 2). The number of points plotted in each panel is 3.75 ·107.
10
APREPRINT - FE BRUARY 11, 2021
Figure 9: Mean (data points)
±
standard deviation (envelop) per slice vs models (see Table 1) for test set (blue) and
training set (red). Slice icorresponds to field values in the interval [0.1·(i1),0.1·i]where i= 1, ..., 10.
11
APREPRINT - FE BRUARY 11, 2021
(a) Average of the mean residual over slices (b) Average of the residual standard deviation over slices
Figure 10: a) For each model, we show the average and maximum over the residual mean value per slice. b) For each
model, we show the average and maximum over the residual standard deviation per slice (see Fig. 9). This was done for
the test and training set.
(a) (b)
(c) (d)
Figure 11: PDF of field obtained via NN (blue) and ground truth (red) in the case of training using MSE, for
a)
model 2
and
b)
model 3 and for training using MAE, for
c)
model 11 and
d)
model 12. When using MSE (
a
and
b
) the NN
predicts zero field values instead of low non-zero field values as the predicted PDF has a larger peak in zero than the
ground truth PDF, and a smaller PDF for small non-zero field values compared with the ground truth PDF. When
training using MAE (cand d) the prediction and ground truth PDFs overlap completely.
12
APREPRINT - FE BRUARY 11, 2021
Model weight (w) p1p2D1D2D3D4Loss hresi(103) 99-P res (102) max res
1 1000 1 1 0.3 0.3 0.3 0.3MSE 2.77 2.26 0.35
2 1 1 1 0.3 0.3 0.3 0.3MSE 2.91 2.25 0.37
3 1 1 1 0.4 0.4 0.1 0.1MSE 3.49 2.03 0.34
4 1 0 1 − − 0.3 0.3MSE 2.49 1.97 0.38
5 1 0 1 − − 0.1 0.1MSE 2.04 1.89 0.35
6 1 1 0 0.3 0.3− − MSE 75.8 16.5 0.47
7 1 1 0 0.4 0.4− − MSE 79.9 21.6 0.65
8 100 1 1 0.3 0.3 0.3 0.3MAE 2.62 2.59 0.33
9 100 1 1 0.4 0.4 0.1 0.1MAE 2.08 2.02 0.30
10 1 1 1 0.3 0.3 0.3 0.3MAE 3.19 3.53 0.40
11 1 1 1 0.4 0.4 0.1 0.1MAE 2.36 2.66 0.25
12 1 0 1 − − 0.1 0.1MAE 2.12 2.17 0.34
13 10 0 1 − − 0.3 0.3MAE 3.15 3.39 0.36
14 10 0 1 − − 0.1 0.1MAE 2.30 2.46 0.33
Table 2: Trained models with their corresponding hyperparameters. Each model is numbered for reference. The weight
w
is defined in Eq.
(1)
. The
Di
for
i= 1, ..., 4
are the dropout values (see Table. 1).
D1
and
D2
apply to NN 1 whereas
D3
and
D4
apply to NN 2.
p1
and
p2
are Boolean variables.
pi= 0
(
pi= 1
) implies NN
i
is turned off (on). If
p1= 0
then the values of
D1
and
D2
are irrelevant, while
p2= 0
makes the values of
D3
and
D4
irrelevant. The loss
column specifies the loss function, either MSE for mean squared error (
α= 2
) or mean absolute error MAE (
α= 1
),
respectively (see Eq.
(1)
). The mean res, 99-P res and max res columns show the mean, 99-percentile and maximum
residual for each model computed over the test set.
4 Discussion
In large-scale mechanistic simulations of biological tissues, calculations of the diffusion of molecular species can
be a significant fraction of the total computational cost. Because biological responses to concentrations often have
a stochastic overlay, high precision may not be essential in these calculations Because NN surrogate estimates are
significantly faster than the explicit calculation of the steady-state diffusion field for a given configuration of sources
and sinks, an effective NN surrogate could greatly increase the practical size of simulated tissues, e.g., in cardiac
simulations [20, 21], cancer simulations [22] and orthopedic simulations [23]. In our case, using a NVIDIA Quadro
RTX 6000, each diffusion solution is about 1000 times faster using the trained NN solver compared to the Julia code.
In order to decide if this acceleration is useful, we have to consider how long it takes to run the direct simulation, how
long the NN takes to train and how long it takes to execute the NN once it has been trained [24]. If each diffusion
calculation takes
δ
seconds to run, conducting
N
calculations directly takes
tdirect =N δ
. If each neural network
surrogate takes
seconds to run, and the number of replicas in the training set is
M
and the training time is
E
, the total
time for the neural network simulation is the time to generate the training set, the training time plus the simulation time,
tneuro =M δ +E+N
. To estimate these times, we ran
20000
explicit simulations in Julia, which took approximately
6 hours and 30 minutes, yielding roughly
1.16s
each. The NN training time was
12
hours on average. While the speedup
for an individual simulation is
δ/ 1000
, the ratio
τneurodir ect
must be smaller than 1 in order to have a useful
acceleration. Equating this ratio to 1and solving for Nyields
Nmin =M+E/δ
1/δ M+E
δ.(4)
Nmin
gives the number of replicas necessary for the total time using the NN to be the same as the direct calculation. Of
course, the exact times will depend on the specific hardware used for the direct and NN calculations. In our case, from
Eq.
(4)
we obtain that
Nmin 57300
, we would need to use the neural network more than
57300
times for the total
time using the NN to be faster than the direct calculation. Thus the NN acceleration is primarily useful in simulations
that will be run many, many times for the
specific situation
for which the NN is appropriate. Consider for example
if you wish to include a variable number of sources, different lattice sizes, different dimensionalities (e.g., 3D) and
boundary conditions. The more general the NN the more training data it will require, th elonger traingin will take, and
13
APREPRINT - FE BRUARY 11, 2021
the slower the individual NN calculations will be. Currently virtual-tissue simulation studies often run thousands to
tens of thousands of replicas and each replica often takes tens of minutes to tens of hours to run. This computational
cost makes detailed parameter identification and uncertainty quantification impractical, since simulations often have
dozens of parameters to explore. If using a NN-based diffusion solver accelerated these simulations by 100
×
it would
permit practical studies with hundreds of thousands to millions of replicas, greatly expanding the feasible exploration of
parameter space for parameter identification and uncertainty quantification.
5 Conclusions
Neural networks provide many possible approaches to generating surrogate diffusion solvers. Given the type of
problem setting, we were interested in a neural network that could predict the stationary field. We considered a deep
convolutional neural network, an autoencoder and their combination. We considered two loss functions, viz. mean
squared error and mean absolute error. We considered different hyperparameters for dropout and an exponential weight
to compensate the under-sampling of high field values. The exponential weight also helped reduce overfitting as shown
in Fig. 8.
The range of scientific and engineering applications for diffusion solvers is very broad. Depending on the specific
application, the predictions by the neural network will have to meet a specific set of criteria quantified in the form
of statistical estimators (e.g. mean error, max error, percentiles, mean relative error, etc.). In this paper we studied
several reasonable error metrics, namely, mean residual, maximum residual, 99-Percentile residual, mean relative
residual, mean weighted residual and the weighted standard deviation residual. The last two metrics compensate for
the low frequency of high field values, ones that usually occur in small regions around sources. The autoencoders are
commonly used in generative models which is applicable, as we have shown here, to the case of a diffusion surrogate.
The field predictions are accurate on all the metrics we considered. This is appears to be due to collapsing the input
into a one-dimensional vector and then decoding back to the initial size, which forces the network to learn the relevant
features [25]. While some models had high errors across all metrics, no single model had the smallest error for all
error metrics. Different networks and hyperparameters were optimal for different metrics, e.g. model 5 had the lowest
mean residual, whereas model 9 yielded relatively good results on all metrics. Model 9 uses both neural networks with
the dropout values for the deep convolutional network were set to
D1,2= 0.4
, and for the autoencoder to
D3,4= 0.1
.
The weight hyperparameter was set to
100
. Recall that large weight hyperparameter values make the loss function
weight high field values over low field values. This is important since the largest absolute error happens close to sources
and close to boundaries because of the under-representation of these kinds of configurations. We also noticed that this
choice reduced the overfitting as was shown in Fig. 8.
Additionally, we tested several loss function. Here we reported the results using mean squared error and mean absolute
error. We noticed two key differences. With MSE the weighted standard deviation (see Fig. 9) is smaller than for MAE
for the training set. However, for the test set, the results for both loss functions are comparable. This difference between
training and test sets suggests that MSE is more prone to overfitting the data than MAE. The other key difference is that
for the MAE, the predicted field probability function consistently overlapped the ground truth completely, whereas for
MSE there is a mismatch in that the NN does not predict low non-zero field values correctly (see Fig. 11). Therefore,
we recommend using MAE as the loss function for surrogate calculations where the field values are well bounded,
as we have shown it produces better predictions than MSE. The autoencoder (NN 2) is capable of approximating the
diffusion field on its own, the convolutional network (NN 1) is not. However, if we use the two networks together we
find that the prediction is more accurate than NN 2 alone.
These encouraging results suggest that we should pursue NN surrogates for acceleration of simulations in which the
solution to the diffusion equation contributes a considerable fraction of the total computational cost. An effective NN
diffusion solver surrogate would need to be able to solve diffusion fields for arbitrary sources and sinks in two or three
dimensions with variable diffusivity, a much higher dimensional set of conditions than the two circular sources in a
uniform two-dimensional square domain that we investigated in this paper. A key question will be the degree to which
NNs are able to generalize, e.g. from
n
sources to
n+ 1
sources or from circular sources to more complex shapes. In
addition, here we only considered absorbing boundary conditions, ultimately mixed boundary conditions are desirable.
It is unclear if the best approach would be a single NN capable of doing multiple boundary conditions, or better to
develop unique NNs for each boundary condition scenario. We will consider these extensions in future work.
6 Acknowledgements
This research was supported in part by Lilly Endowment, Inc., through its support for the Indiana University Pervasive
Technology Institute.
14
APREPRINT - FE BRUARY 11, 2021
This work is partially supported by the National Science Foundation (NSF) through awards nanoBIO 1720625, CINES
1835598 and Global Pervasive Computational Epidemiology 1918626 and Cisco University Research Program Fund
grant 2020-220491.
This work is partially supported by the Biocomplexity Institute at Indiana University, National Institutes of Health,
grant NIGMS R01 GM122424
15
APREPRINT - FE BRUARY 11, 2021
References
[1] Rob Phillips. Membranes by the numbers. In Physics of Biological Membranes, pages 73–105. Springer, 2018.
[2]
William E Schiesser. The numerical method of lines: integration of partial differential equations. Elsevier, 2012.
[3]
Christoph Baur, Stefan Denner, Benedikt Wiestler, Nassir Navab, and Shadi Albarqouni. Autoencoders for
unsupervised anomaly segmentation in brain mr images: A comparative study. Medical Image Analysis, page
101952, 2020.
[4]
Jia-Shu Zhang and Xian-Ci Xiao. Predicting chaotic time series using recurrent neural network. Chinese Physics
Letters, 17(2):88, 2000.
[5]
Pierre Dubois, Thomas Gomez, Laurent Planckaert, and Laurent Perret. Data-driven predictions of the lorenz
system. Physica D: Nonlinear Phenomena, page 132495, 2020.
[6]
Wei Chen and Andrew L Ferguson. Molecular enhanced sampling with autoencoders: On-the-fly collective
variable discovery and accelerated free energy landscape exploration. Journal of computational chemistry,
39(25):2079–2102, 2018.
[7]
Hang Zhang, Kedar Hippalgaonkar, Tonio Buonassisi, Ole M Løvvik, Espen Sagvolden, and Ding Ding. Machine
learning for novel thermal-materials discovery: early successes, opportunities, and challenges. arXiv preprint
arXiv:1901.05801, 2019.
[8]
Frank Noé, Simon Olsson, Jonas Köhler, and Hao Wu. Boltzmann generators: Sampling equilibrium states of
many-body systems with deep learning. Science, 365(6457):eaaw1147, 2019.
[9]
Paraskevi Gkeka, Gabriel Stoltz, Amir Barati Farimani, Zineb Belkacemi, Michele Ceriotti, John Chodera,
Aaron R Dinner, Andrew Ferguson, Jean-Bernard Maillet, Hervé Minoux, et al. Machine learning force fields and
coarse-grained variables in molecular dynamics: application to materials and biological systems. arXiv preprint
arXiv:2004.06950, 2020.
[10]
Frank Noé, Alexandre Tkatchenko, Klaus-Robert Müller, and Cecilia Clementi. Machine learning for molecular
simulation. Annual review of physical chemistry, 71:361–390, 2020.
[11]
MF Kasim, Duncan Watson-Parris, Lucia Deaconu, Sophy Oliver, Peter Hatfield, Dustin H Froula, Gianluca
Gregori, Matt Jarvis, Samar Khatiwala, Jun Korenaga, et al. Up to two billion times acceleration of scientific
simulations with deep neural architecture search. arXiv preprint arXiv:2001.08055, 2020.
[12]
J. Quetzalcoatl Toledo-Marin. Stationary diffusion state ml surrogate using flux and cuarrays.
https://github.
com/jquetzalcoatl/DiffusionSurrogate, 2020.
[13]
Min Chen, Xiaobo Shi, Yin Zhang, Di Wu, and Mohsen Guizani. Deep features learning for medical image
analysis with convolutional autoencoder neural network. IEEE Transactions on Big Data, 2017.
[14] Flux documentation. https://fluxml.ai/Flux.jl/stable/. Accessed: 2021-01-18.
[15]
Christopher Rackauckas and Qing Nie. Differentialequations.jl–a performant and feature-rich ecosystem for
solving differential equations in julia. Journal of Open Research Software, 5(1), 2017.
[16]
Andre
˘
ı Nikolaevich Tikhonov and Aleksandr Andreevich Samarskii. Equations of mathematical physics. Courier
Corporation, 2013.
[17] Mike Innes. Flux: Elegant machine learning with julia. Journal of Open Source Software, 3(25):602, 2018.
[18]
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,
2014.
[19] Fox Geoffrey. Draft deep learning for spatial time series. Technical Report, 2020.
[20]
Roy CP Kerckhoffs, Maxwell L Neal, Quan Gu, James B Bassingthwaighte, Jeff H Omens, and Andrew D
McCulloch. Coupling of a 3d finite element model of cardiac ventricular mechanics to lumped systems models of
the systemic and pulmonic circulation. Annals of biomedical engineering, 35(1):1–18, 2007.
[21]
Joakim Sundnes, S Wall, Harald Osnes, Tom Thorvaldsen, and Andrew D McCulloch. Improved discretisation
and linearisation of active tension in strongly coupled cardiac electro-mechanics simulations. Computer methods
in biomechanics and biomedical engineering, 17(6):604–615, 2014.
[22]
René Bruno, Dean Bottino, Dinesh P de Alwis, Antonio T Fojo, Jérémie Guedj, Chao Liu, Kristin R Swanson,
Jenny Zheng, Yanan Zheng, and Jin Y Jin. Progress and opportunities to advance clinical cancer therapeutics
using tumor dynamic models. Clinical Cancer Research, 26(8):1787–1795, 2020.
[23]
Ahmet Erdemir, Scott McLean, Walter Herzog, and Antonie J van den Bogert. Model-based estimation of muscle
forces exerted during movements. Clinical biomechanics, 22(2):131–154, 2007.
16
APREPRINT - FE BRUARY 11, 2021
[24]
Geoffrey Fox, James Glazier, JCS Kadupitiya, Vikram Jadhao, Minje Kim, Judy Qiu, James P Sluka, Endre
Somogy, Madhav Marathe, Abhijin Adiga, et al. Learning everywhere: Pervasive machine learning for effective
high-performance computation. In 2019 IEEE International Parallel and Distributed Processing Symposium
Workshops (IPDPSW), pages 422–429. IEEE, 2019.
[25]
Diederik P Kingma and Max Welling. An introduction to variational autoencoders. arXiv preprint
arXiv:1906.02691, 2019.
17
APREPRINT - FE BRUARY 11, 2021
A Supplementary material: Probability density function per slice
In this section we show the PDF’s per slice as described in the main text. We took
20
slices of size
0.05
in the direction
y=x
for the plots shown in Fig. 8. We then compute the mean residual and standard deviation per slice. For each slice,
we have also plotted a Gaussian distribution (red curve) for guidance purposes which has mean and standard deviation
set to the mean residual and standard deviation per slice, respectively.
18
APREPRINT - FE BRUARY 11, 2021
Figure S1: Model 1. PDF over all field values and PDF over slices. The red curve corresponds to a Gaussian distribution
centered at the PDF mean value and with standard deviation equal to that of the PDF.
19
APREPRINT - FE BRUARY 11, 2021
Figure S2: Model 2. PDF over all field values and PDF over slices. The red curve corresponds to a Gaussian distribution
centered at the PDF mean value and with standard deviation equal to that of the PDF.
20
APREPRINT - FE BRUARY 11, 2021
Figure S3: Model 3. PDF over all field values and PDF over slices. The red curve corresponds to a Gaussian distribution
centered at the PDF mean value and with standard deviation equal to that of the PDF.
21
APREPRINT - FE BRUARY 11, 2021
Figure S4: Model 5. PDF over all field values and PDF over slices. The red curve corresponds to a Gaussian distribution
centered at the PDF mean value and with standard deviation equal to that of the PDF.
22
APREPRINT - FE BRUARY 11, 2021
Figure S5: Model 6. PDF over all field values and PDF over slices. The red curve corresponds to a Gaussian distribution
centered at the PDF mean value and with standard deviation equal to that of the PDF.
23
APREPRINT - FE BRUARY 11, 2021
Figure S6: Model 8. PDF over all field values and PDF over slices. The red curve corresponds to a Gaussian distribution
centered at the PDF mean value and with standard deviation equal to that of the PDF.
24
APREPRINT - FE BRUARY 11, 2021
Figure S7: Model 9. PDF over all field values and PDF over slices. The red curve corresponds to a Gaussian distribution
centered at the PDF mean value and with standard deviation equal to that of the PDF.
25
APREPRINT - FE BRUARY 11, 2021
Figure S8: Model 10. PDF over all field values and PDF over slices. The red curve corresponds to a Gaussian
distribution centered at the PDF mean value and with standard deviation equal to that of the PDF.
26
APREPRINT - FE BRUARY 11, 2021
Figure S9: Model 11. PDF over all field values and PDF over slices. The red curve corresponds to a Gaussian
distribution centered at the PDF mean value and with standard deviation equal to that of the PDF.
27
APREPRINT - FE BRUARY 11, 2021
Figure S10: Model 12. PDF over all field values and PDF over slices. The red curve corresponds to a Gaussian
distribution centered at the PDF mean value and with standard deviation equal to that of the PDF.
28
APREPRINT - FE BRUARY 11, 2021
Figure S11: Model 13. PDF over all field values and PDF over slices. The red curve corresponds to a Gaussian
distribution centered at the PDF mean value and with standard deviation equal to that of the PDF.
29
APREPRINT - FE BRUARY 11, 2021
Figure S12: Model 14. PDF over all field values and PDF over slices. The red curve corresponds to a Gaussian
distribution centered at the PDF mean value and with standard deviation equal to that of the PDF.
30
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Computing equilibrium states in condensed-matter many-body systems, such as solvated proteins, is a long-standing challenge. Lacking methods for generating statistically independent equilibrium samples in “one shot,” vast computational effort is invested for simulating these systems in small steps, e.g., using molecular dynamics. Combining deep learning and statistical mechanics, we developed Boltzmann generators, which are shown to generate unbiased one-shot equilibrium samples of representative condensed-matter systems and proteins. Boltzmann generators use neural networks to learn a coordinate transformation of the complex configurational equilibrium distribution to a distribution that can be easily sampled. Accurate computation of free-energy differences and discovery of new configurations are demonstrated, providing a statistical mechanics tool that can avoid rare events during sampling without prior knowledge of reaction coordinates.
Article
Full-text available
Reliably modeling normality and differentiating abnormal appearances from normal cases is a very appealing approach for detecting pathologies in medical images. A plethora of such unsupervised anomaly detection approaches has been made in the medical domain, based on statistical methods, content-based retrieval, clustering and recently also deep learning. Previous approaches towards deep unsupervised anomaly detection model patches of normal anatomy with variants of Autoencoders or GANs, and detect anomalies either as outliers in the learned feature space or from large reconstruction errors. In contrast to these patch-based approaches, we show that deep spatial autoencoding models can be efficiently used to capture normal anatomical variability of entire 2D brain MR images. A variety of experiments on real MR data containing MS lesions corroborates our hypothesis that we can detect and even delineate anomalies in brain MR images by simply comparing input images to their reconstruction. Results show that constraints on the latent space and adversarial training can further improve the segmentation performance over standard deep representation learning.
Article
Full-text available
Macromolecular and biomolecular folding landscapes typically contain high free energy barriers that impede efficient sampling of configurational space by standard molecular dynamics simulation. Biased sampling can artificially drive the simulation along pre-specified collective variables (CVs), but success depends critically on the availability of good CVs associated with the important collective dynamical motions. Nonlinear machine learning techniques can identify such CVs but typically do not furnish an explicit relationship with the atomic coordinates necessary to perform biased sampling. In this work, we employ auto-associative artificial neural networks ("autoencoders") to learn nonlinear CVs that are explicit and differentiable functions of the atomic coordinates. Our approach offers substantial speedups in exploration of configurational space, and is distinguished from exiting approaches by its capacity to simultaneously discover and directly accelerate along data-driven CVs. We demonstrate the approach in simulations of alanine dipeptide and Trp-cage, and have developed an open-source and freely-available implementation within OpenMM.
Article
Full-text available
At present, computed tomography (CT) are widely used to assist diagnosis. Especially, computer aided diagnosis (CAD) based on artificial intelligence (AI) is an extremely important research field in intelligent healthcare. However, it is a great challenge to establish an adequate labeled dataset for CT analysis assistance, due to the privacy and security issues. Therefore, this paper proposes a convolutional autoencoder deep learning framework to support unsupervised image features learning for lung nodule through unlabeled data, which only needs a small amount of labeled data for efficient feature learning. Through comprehensive experiments, it evaluates that the proposed scheme is superior to other approaches, which effectively solves the intrinsic labor-intensive problem during of artificial image labeling. Moreover, it verifies that the proposed convolutional autoencoder approach can be extended for similarity measurement of lung nodules images. Especially, the features extracted through unsupervised learning are also applicable in other related scenarios.
Article
Full-text available
DifferentialEquations.jl is a package for solving differential equations in Julia. It covers discrete equations (function maps, discrete stochastic (Gillespie/Markov) simulations), ordinary differential equations, stochastic differential equations, algebraic differential equations, delay differential equations, hybrid differential equations, jump diffusions, and (stochastic) partial differential equations. Through extensive use of multiple dispatch, metaprogramming, plot recipes, foreign function interfaces (FFI), and call-overloading, DifferentialEquations.jl offers a unified user interface to solve and analyze various forms of differential equations while not sacrificing features or performance. Many modern features are integrated into the solvers, such as allowing arbitrary user-defined number systems for high-precision and arithmetic with physical units, built-in multithreading and parallelism, and symbolic calculation of Jacobians. Integrated into the package is an algorithm testing and benchmarking suite to both ensure accuracy and serve as an easy way for researchers to develop and distribute their own methods. Together, these features build a highly extendable suite which is feature-rich and highly performant.
Article
This paper investigates the use of a data-driven method to model the dynamics of the chaotic Lorenz system. An architecture based on a recurrent neural network with long and short term dependencies predicts multiple time steps ahead the position and velocity of a particle using a sequence of past states as input. To account for modeling errors and make a continuous forecast, a dense artificial neural network assimilates online data to detect and update wrong predictions such as non-relevant switchings between lobes. The data-driven strategy leads to good prediction scores and does not require statistics of errors to be known, thus providing significant benefits compared to a simple Kalman filter update.
Article
Variational autoencoders provide a principled framework for learning deep latent-variable models and corresponding inference models. In this work, we provide an introduction to variational autoencoders and some important extensions.
Chapter
Many of the most important processes in cells take place on and across membranes. With the rise of an impressive array of powerful quantitative methods for characterizing these membranes, it is an opportune time to reflect on the structure and function of membranes from the point of view of biological numeracy. To that end, in this chapter, I review the quantitative parameters that characterize the mechanical, electrical, and transport properties of membranes and carry out a number of corresponding order-of-magnitude estimates that help us understand the values of those parameters.