ArticlePDF Available

Abstract and Figures

Combustion is a complex chemical system which involves thousands of chemical reactions and generates hundreds of molecular species and radicals during the process. In this work, a neural network-based molecular dynamics (MD) simulation is carried out to simulate the benchmark combustion of methane. During MD simulation, detailed reaction processes leading to the creation of specific molecular species including various intermediate radicals and the products are intimately revealed and characterized. Overall, a total of 798 different chemical reactions were recorded and some new chemical reaction pathways were discovered. We believe that the present work heralds the dawn of a new era in which neural network-based reactive MD simulation can be practically applied to simulating important complex reaction systems at ab initio level, which provides atomic-level understanding of chemical reaction processes as well as discovery of new reaction pathways at an unprecedented level of detail beyond what laboratory experiments could accomplish.
Content may be subject to copyright.
Complex reaction processes in combustion
unraveled by neural network-based molecular
dynamics simulation
Jinzhe Zeng 1, Liqun Cao 1, Mingyuan Xu1, Tong Zhu 1,2 & John Z. H. Zhang 1,2,3,4
Combustion is a complex chemical system which involves thousands of chemical reactions
and generates hundreds of molecular species and radicals during the process. In this work, a
neural network-based molecular dynamics (MD) simulation is carried out to simulate the
benchmark combustion of methane. During MD simulation, detailed reaction processes
leading to the creation of specic molecular species including various intermediate radicals
and the products are intimately revealed and characterized. Overall, a total of 798 different
chemical reactions were recorded and some new chemical reaction pathways were dis-
covered. We believe that the present work heralds the dawn of a new era in which neural
network-based reactive MD simulation can be practically applied to simulating important
complex reaction systems at ab initio level, which provides atomic-level understanding of
chemical reaction processes as well as discovery of new reaction pathways at an unprece-
dented level of detail beyond what laboratory experiments could accomplish. OPEN
1Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China
Normal University, Shanghai 200062, China. 2NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China. 3Department of
Chemistry, New York University, New York, NY 10003, USA. 4Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi
030006, China. email:;
NATURE COMMUNICATIONS | (2020) 11:5713 | /s41467-020-19497-z | 1
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Ever since learning to use re, human beings have never
stopped studying combustion. With increasingly serious
concern on environmental pollution from combustion,
understanding and mastering the combustion mechanisms is of
great importance. Gaining fundamental insights into combustion
processes can help us design more efcient engines and minimize
the production of pollutants. A typical combustion may contain
hundreds of chemical species and thousands of fundamental
chemical reactions. In particular, combustion occurs at extreme
physical conditions with high pressures and high temperatures up
to several thousand degrees. Also, many elementary reactions in a
combustion typically occur on sub picosecond time scale. These
extreme physical conditions make it very difcult, if not impos-
sible, to carry out real-time experimental study of combustion.
Thus, most experimental investigations of chemical reaction
mechanisms focus on individual reactions instead of the complex
reaction processes occurring in a combustion. In the past decades,
in slico experiments such as reactive molecular dynamics (MD)
simulations have shown their values in providing molecular
(atomic)-level insights into the mechanism of combustions. In a
reactive MD simulation, the reaction condition can be easily
controlled in the simulation and some supercritical conditions
that are difcult to achieve in the experiment can also be handled.
Compared with the traditional theoretical approaches such as
transition sate theory and quantum collision theory that focuses
on studying a single reaction, reactive MD simulation can con-
struct the entire interwoven reaction network of a combustion
system1. The heart of the reactive MD simulation is the potential
energy surface (PES), which describes the inter- and intra-
molecular interactions for molecules. Currently, there are mainly
two classes of methods that can be used to construct the PES of a
given molecular system: the quantum mechanics (QM)-based
methods and the empirical force elds. Quantum mechanics is
undoubtedly more rigorous and accurate, and MD simulations
based on it are known as ab initio MD simulation (AIMD)2,3.
Although the AIMD method in principle can simulate complex
chemical reactions in real time, it is limited to relatively small
systems and short simulation time (typically, dozens of picose-
conds) due to exorbitant computational costs of on-the-yab
initio calculation. With the rapid development of computer
hardware and algorithms, especially the employment of graphic
processing units (GPUs), some AIMD methods have recently
begun to handle larger chemical systems4. But so far, it is still
impractical to use AIMD to simulate large-scale complex reaction
systems such as combustions. Over the past decades, many
reactive force elds (or PESs) have been developed and success-
fully used for various reactive molecular systems512. A com-
prehensive discussion of these reactive force elds can be found in
refs. 13,14. Among these force elds, the empirical ReaxFF was
widely used in MD simulation of combustion systems due to its
computational efciency15, but its accuracy and reliability are of
signicant concern1618. The key points of developing a reaction
force eld are the choice of the functional form and the para-
meterization process, which are complicated and depend on
human intervention.
Recently, more researchers are switching to seek the help of
machine-learning (ML) methods. ML method, especially articial
neural networks (NN), provides the possibility to construct PESs
with the accuracy of the QM method but with an efciency
comparable to that of force elds. Neural networks constitute a
very exible and unbiased class of mathematical functions, which
in principle is able to approximate any real-valued function to
arbitrary accuracy. Since Behler and Parrinello proposed the
high-dimensional neural network approach19,20, several methods
have been developed to implement this approach and many
different kind of NN PESs have been proposed for water, small
organic molecules, and metalloid materials2125. For example, the
sGDML2628, SchNet29, PhysNet30, and FCHL31 methods. NN
potentials have also been employed to study the reaction
mechanisms of chemical systems. By combining high-precision
NN PESs and quantum collision theory, Zhang and Jiangs group
have studied a series of elementary reactions in the gas phase and
on the surface3235. Liu and co-workers developed the LASP
program to study the heterogeneous catalysis with NN PESs36
and built stochastic surface walking (SSW)-NN to explore reac-
tion pathways from glucose to 5-hydroxymethylfurfural37. Brickel
et al. also studied the nucleophilic substitution reaction
Br]in water with NN potential38.
In this report, we present an in silico simulation of methane
combustion based on an NN potential derived by training a high-
dimensional NN model from ab initio computed energies. To
achieve high efciency and accuracy, the DeePMD model was
used3941. This NN PES can accurately predict the energy and
atomic forces of reactants, products and reaction intermediates.
Based on this model, a 1-ns reactive MD simulation was per-
formed for a combustion system initially containing 100 methane
and 200 oxygen molecules with a sub-femtosecond time resolu-
tion (Fig. 1). A complete reaction network of the methane com-
bustion can be constructed from the MD trajectory. The
simulation not only produced the main reaction pathways that
are consistent with the experiment but also provided much more
detailed insights about the combustion processes as will be
described in the following.
Accuracy of the NN PES. The performance of the NN potential
highly depends on the quality of the reference datasets. Although
several databases, such as QM742, QM943, ANI-144, and ANI-
1x45, are accessible, they mainly include organic molecules and
are therefore not suitable for this work. Combustion of methane
will generate many molecular fragments and a lot of them are free
radicals46. Therefore, we followed a workow (details are listed in
the Methodssection) to construct the reference datasets for the
combustion. Then the DeepPot-SE model47 was used to train
the NN PES based on the reference. The predictive power of the
NN model is shown in Supplementary Table 1 and Supplemen-
tary Fig. 1. It is clear that the DFT energies can be accurately
reproduced by the NN model. The mean absolute errors are only
0.04 and 0.14 eV/atom in the training set and the test set,
respectively. As for the atomic forces, the predicted values of the
NN model are also highly consistent with the calculated results of
the DFT (Supplementary Fig. 1). The correlation coefcient is
0.999 and the MAE is 0.12 eV/Å. Considering that there are a
large number of atomic and molecular collisions during the
combustion process, and some atomic forces can be as high as
dozens of eV/Å, the accuracy of the NN model is encouraging. To
verify the energy conservation of the NN PES, we performed a
reactive MD simulation under the NVE ensemble. The system is a
periodic box containing 100 CH
molecules and 200 O
cules (a total of 900 atoms) with a density of 0.25 g/cm3.As
shown in Supplementary Fig. 2, the total energy is conserved in
MD simulation.
The initial stage of combustion. A 1 ns reactive MD simulation
was performed for methane combustion with the NN PES under
the NVT ensemble. The system is also a periodic box containing
100 CH
molecules and 200 O
molecules (a total of 900 atoms)
with a density of 0.25 g/cm3. The MD simulations were run with a
time-step of 0.1 fs and the temperature was kept at 3000 K by
using the Berendsen thermostat. We chose a relatively high
density (and thus high pressure) and high temperature to
2NATURE COMMUNICATIONS | (2020) 11:5713 | /s41467-020-19497-z |
Content courtesy of Springer Nature, terms of use apply. Rights reserved
enhance the collision probability and sampling efciency, which
is a widely used strategy in reactive MD simulations because the
time scale of the simulation is much shorter than that of
experiments. In fact, experiments usually do not use pure fuel for
combustion, but rather mix the fuel into a relatively inert gas for
safety. In future work, we will try to combine the NN potential
and enhanced sampling algorithms to bring simulated conditions
more realistic.
Figure 1b and Supplementary Fig. 3 show the time-dependent
progression of the main molecular species during the MD
simulation. After 1 ns, about 90 CH
and 150 O
are consumed
and about 160 H
O, 30 CO, and 50 CO
are produced. The
potential energy of the system during the simulation is shown in
Supplementary Fig. 4. Although the system has not reached
equilibrium, the important ignition process has already done,
which includes much richer reaction information. In order to
describe the complicated reaction network in more detail, we
divided the combustion process into three stages, namely the
initial stage of the combustion, the production of intermediate
species of formaldehyde and formyl radical, and the production
of CO and CO
The reaction network in the initial stage of the combustion is
shown in Fig. 2a. The combustion of methane started with the
abstraction of its hydrogen atom by O
to generate two radicals
and HOO· (R3). As is seen from Fig. 2b, this process started
at about 32 ps and took about 0.2 ps to nish. During the
simulation, other radicals such as ·OH, ·H, and HOO· also
abstracted hydrogen atom from CH
to generate ·CH
Among them, the ·OH radical is the main species who complete
this work and generates water molecules (R1). The atomization of
methane into ·H and ·CH
was also observed.
Many ·CH
radicals interact with the ·OH radicals to form
methanol (R6) molecules. According to Fig. 2c, this process was
also very quick. Some ·CH
interacted with O
and HOO· to form
methyldioxidanyl (CH
OO·, R4) and methyl-hydroperoxide
OOH, R5). Radicals such as ·OH can also abstract H atoms
from ·CH
and produce :CH
. Methanol can further react with
·OH and ·H to generate methoxy radicals (CH
O·, R10, R11),
O and H
. It can also react with ·H to generate ·CH
OH and
(R12). The CH
O· can also be produced by the interaction
between CH
OO· or CH
OOH with ·H (R8 and R9).
Production of formaldehyde and formyl radicals. Most meth-
oxy radicals generated from the last step were converted to for-
maldehyde mainly through two reaction pathways (Fig. 3a). The
rst one is for methoxy radical to interact with ·OH to form
formaldehyde and H
O (R16). As shown in Fig. 3b, this process
took about 0.3 ps. The other pathway is for methoxy radical to
interact with ·H and generate formaldehyde and H
(R17). The
OH radicals can also convert to formaldehyde by losing the
hydrogen atom on its hydroxyl group (R14 and R15). If it loses
one H atom on the methylene group, it can generate :CHOH
radicals (R13). In addition, the :CH
radicals can interact with
·OH and form formaldehyde and the methylidyne radical (R18
and R19).
The formaldehydes were further converted into the formyl
(·CHO) radicals. The main reaction pathways are hydrogen
abstraction by ·O and ·OH. Figure 3c shows the trajectory of the
reaction CH
O. An ·OH radical
approaches the rotating formaldehyde molecule and snatches an
H atom to form a water molecule; the whole process takes about
0.4 ps. In addition, other species such as ·H, O
, HOO·, and ·CH
also abstracted the hydrogen atom from formaldehyde to form
formyl radicals. The R20 and R23 are two reactions that form
formyl radicals without the participation of formaldehyde.
Production of CO and CO
. Formyl radicals can convert to CO
by losing hydrogen in two ways (Fig. 4a). Firstly, it can lose an H
atom directly (R25). Figure 4b shows a real-time trajectory of this
process. A formyl radical lost its H atom at about 405.79 ps, but
this reaction was quickly reversed and the formyl radical was re-
formed. After another 0.4 ps the reaction took place again to form
CO. Secondly, ·OH can also abstract the H atom from the formyl
radical and generate H
O and CO (R26).
The formyl radical can combine with the ·OH radical to form
formic acid (R24), which can further lose its H atom to form
·COOH (R27) or HCOO· (R30). These two species can convert to
through the reaction with ·OH or ·H (R29 and R31). The
·COOH radical can also interact with ·H and generate CO and
O (R28). Figure 4c shows the trajectory of reaction CO
+·H (R32). At 815.32 ps, an ·OH radical started to
approach a CO molecule, and at 815.38 ps, an intermediate
a0.0 ns 0.2 ns 0.4 ns 0.6 ns 0.8 ns 1.0 ns
0 0.2 0.4 0.6 0.8 1
Number of molecules
Time (ns)
Fig. 1 Real-time dynamics of methane combustion. a Snapshots of the partial combustion system extracted from the reactive MD simulation of methane
combustion (the time interval is 0.2 ns). The main molecular species of CH
O and CO
molecules are colored in cyan, red, blue and black,
respectively. Other molecular species are colored in white. One can see that the number of reaction products were continuously increasing while reactants
were being consumed. bTime dependences of the numbers of main molecular species in real-time MD simulation. These curves are smoothed to make
them look better and clearer.
NATURE COMMUNICATIONS | (2020) 11:5713 | /s41467-020-19497-z | 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved
R10, R11
R1: CH4 + .OH
CH4 + .H
CH4 + O2
.CH3 + O2
.CH3 + HOO.
.CH3 + .OH
.CH3 + .OH
CH3OO. + .H
CH3OH + .H
CH3OH + .H
.CH3 + H2O
.CH3 + H2
.CH3 + HOO.
:CH2 + H2O
CH3O. + .OH
CH3O. + H2O
CH3O. + H2
CH3O. + H2O
.CH2OH + H2
1.36 1.60 2.14
32.21 ps 32.22 ps 32.23 ps 32.24 ps 32.25 ps
3.61 2.23 1.89 1.48 1.29
104.90 ps 104.95 ps 104.96 ps 104.97 ps 104.98 ps
Fig. 2 The initial stage of combustion. a Main reaction pathways in the initial stage of the combustion. bA real-time trajectory showing the reaction
process of hydrogen abstraction from methane by O
. Atoms in cyan, red and gray colors are carbon, oxygen and hydrogen, respectively. cA real-time
trajectory showing the reaction process leading to the creation of methanol. Denition of colored atoms is the same as in (b).
R14, R15
229.88 ps 229.89 ps 229.90 ps 229.91 ps 229.93 ps
2.58 1.56 1.86
1.34 0.98
342.14 ps 342.16 ps 342.17 ps 342.18 ps 342.20 ps
4.30 2.37
a.CH2OH :CHOH + .H
CH2O + .O.CHO + .OH
CH2O + .OH .CHO + H2O
.CHO + .H
.CH2OH + O2CH2O + HOO.
.CH2OH CH2 O + .H
:CH2 + .OH CH2 O + .H
CH3O. + .OH CH2O + H2O
CH3O. + .HCH2O + H2
.CHO + .H
:CH + H2O
:CH2 + .OH :CH + H2O
Fig. 3 Production of formaldehyde and formyl radicals. a The main reaction pathways for the formation of formaldehyde and formyl radicals. bThe real-
time trajectory of the reaction CH
O. Atoms in cyan, red and gray colors are carbon, oxygen and hydrogen, respectively. cThe real-
time trajectory of the reaction CH
+·OH ·CHO +H
O. Denition of colored atoms is the same as in (b).
4NATURE COMMUNICATIONS | (2020) 11:5713 | /s41467-020-19497-z |
Content courtesy of Springer Nature, terms of use apply. Rights reserved
COOH was formed. The COOH should be relatively inactive, it
stably existed for about 0.1 ps, and nally lost an H atom and
became CO
Further analysis found that the above-mentioned 32 reactions
have all been found by experiments, and the reaction networks
constructed by them are also highly consistent with the main
reaction networks found experimentally48,49. We totally detected
505 molecular species and 798 reactions from the trajectory.
Species such as ethane, ethylene, and acetylene can also be found
in the experimental database. In all, 130 of the 798 reactions
extracted from the MD trajectory were included in the widely
accepted GRI_Mech experimental mechanism library48. Some
experimentally observed reactions were not observed in our
simulation, mostly likely because the present simulation was
performed at relatively high temperature.
In fact, discovering new reactions is an important advantage of
the present approach. For methane oxidation, a system that has
been extensively studied by experiments, NN-based reactive MD
can still discover hundreds of chemical reactions that have not
been experimentally reported. This demonstrates that reactive
MD can be a powerful tool to study combustion reactions.
Interestingly, we found a cyclopropene molecule in the trajectory,
which has not been reported to our knowledge. As shown in
Supplementary Fig. 5, at 634.09 ps, a CO molecule collided with a
radical and joined together. Then a CH
CO molecule was
formed through hydrogen loss. The CH
CO was stable for about
200 ps and then combined with another ·CH
radical. Subsequent
hydrogen loss led to the formation of a cycloprop-2-en-1-one
molecule at 828.65 ps. After another 60 ps, the third ·CH
attacked the cycloprop-2-en-1-one molecule and kicked out the
CO group to form the CH
molecule at 889.50 ps. Through
further internal reaction and hydrogen loss, it nally formed a
cyclopropene molecule at 891.16 ps and remained stable through-
out the rest of the simulation. The entire process took about
260 ps to complete. While it might be possible that nding
cyclopropene in our simulation is a coincidence or driven by
the relatively high temperature, it still illustrates the ability of
reactive MD simulation to discover new molecules and new
Accurate in silico MD simulation of combustion or other com-
plex chemical reactions is one of the ultimate goals of compu-
tational chemistry. In this work, an articial neural network
potential model trained to ab initio data describes complex che-
mical reactions in methane combustion. This NN potential model
is orders of magnitude faster than the conventional DFT calcu-
lation. Benet from the high efciency of the NN model and GPU
acceleration, nanosecond-sale MD simulations for a chemical
system containing 900 atoms was achieved in about 4 days or so
on an NVIDIA Tesla P100 card. Detailed reaction mechanisms
were extracted from the MD trajectory and the detected mole-
cular species and reaction networks are in excellent agreement
with experimental observation. In addition, many new reactions
were found that were not included in the experimental database.
Compared to laboratory experiments, in silico simulations can be
performed under more extreme conditions, and any specic
reaction of interest can be easily detected and tracked. In addi-
tion, MD simulation can achieve ultra-high time resolution. The
time-step used in this work is 0.1 fs. With the improvement of
algorithms and hardware, even resolutions in smaller time scale
can be achieved.
Compared with the traditional prior knowledge-based theore-
tical approach, reactive MD simulation can explore complex
reaction networks and discover new reactions and species without
any prior knowledge of reactions. Actually, complex reactions
cannot be well understood without considering the kinetics of the
reaction network it belongs to. Since reactive MD simulation
tracks all chemical reactions in real time, one can even deduce the
rate constants for individual reactions from a single MD trajec-
tory by statistical analysis. We extracted the ten most statistically
signicant reactions from the trajectory and calculated their rate
constants based on the algorithms developed in previous
studies50,51. As shown in Supplementary Table 2, most of the rate
constants agree well with the GRI_Mech data48. The main source
of error might come from the uncertainties of parameters in the
Arrhenius formula and the completeness of sampling. Ideally, one
should run many trajectories with different initial conditions to
obtain truly statistically accurate results. However, although these
R28 R29
405.78 ps 405.79 ps 405.83 ps 405.84 ps 405.85 ps
2.08 1.98 2.44
815.32 ps 815.35 ps 815.38 ps 815.48 ps 815.49 ps
2.43 1.74 1.40 2.03
.CHO CO + .H
.CHO + .OH CO + H2O
.COOH + .HCO + H2O
.COOH + .OH CO2 + H2O
HCOO . + .HCO2 + H2
CO + . OH CO2 + .H
Fig. 4 Production of CO and CO
.aMain reaction pathways for the formation of CO and CO
.bThe real-time trajectory of the reaction ·CHO CO +·H.
Atoms in cyan, red and gray colors are carbon, oxygen and hydrogen, respectively. cThe real-time trajectory of the reaction CO +·OH CO
Denition of colored atoms is the same as in (b).
NATURE COMMUNICATIONS | (2020) 11:5713 | /s41467-020-19497-z | 5
Content courtesy of Springer Nature, terms of use apply. Rights reserved
rates may not be accurate enough to be used directly in kinetic
modeling, they can be effective in contributing to a comprehen-
sive understanding of the combustion reaction.
A practical issue to be pointed out is that although some
algorithms were used in this study to minimize the size of the
reference dataset, there are still 578,731 structures in the reference
set. Although the DFT calculation is very efcient, such a large
reference set is difcult to perform high-level post-HartreeFock
calculations. In order to further minimize the size of the reference
set while ensuring its completeness, new algorithms need to be
developed to further enhance the efciency of this approach.
Recently, Zhang et al. developed the DP-GEN52 (Deep potential
Generator) software platform, which can automatically construct
the reference dataset and train the NN model. The concurrent
learning algorithm employed by this platform can make the
redundancy of the reference set as small as possible. We are trying
to integrate the algorithms developed in this work into the DP-
GEN platform.
In addition, it is worth to point that while combustion is
usually thought to be dominated by free radical reactions,
recent studies have begun to examine the role of electronically
excited state species in combustion. For example, the additional
introduction of plasma was found to be effective in promoting
combustion in experiments53. However, MD simulations
involving excited states are highly nontrivial, and there are large
uncertainties in ab initio quantum chemistry computation
for treating excited states of large systems. Based on sophisti-
cated empirical or machine-learning PESs, several recent works
have achieved the excited-state MD simulation for model sys-
tems5462.Forexample,theO+O recombination reaction to
form the ground and excited-state singlet O
molecules on
amorphous solid water60. Such strategy will be considered in
our future studies.
Despite further improvement is needed, the current report
heralds the dawn of a new era in which neural network-based
reactive MD simulation can be practically applied to simulating
complex reaction systems at the ab initio level, which provides
atomic-level understanding of every reaction process at unpre-
cedented level of details beyond what laboratory experiment can
Reference dataset. In this study, a workow was developed for making
reference datasets (Fig. 5). The details of each module in the workow are given
To increase the efciency of dataset construction, reactive MD simulation with
ReaxFF was used to sample an initial dataset. A model combustion system
containing a lot of CH
and H
molecules was built by using the Amorphous Cell
module in the Material Studio63 software package. Then the LAMMPS64 program
was used to perform the MD simulation. The NVT ensemble was used and the
temperature was set to 3000 K with the Berend sen thermostat. The ReaxFF
parameter of Chenoweth et al. (CHO-2008 parameter set)65 was employed. The
Open Babel software66 and the Depth-First Search algorithm67 were used to detect
species in every snapshot of the trajectory. Then, for each atom in each snapshot,
we build a molecular cluster that contains this atom and species that within a
specied cutoff centered on it. In this work, the cutoff was set to 5 Å.
The initial dataset contains about 22.5 million structures, which is too large to
perform QM calculations for every molecular cluster it contains. Therefore, it is
necessary to resample it to remove redundant structures while ensuring its
completeness. To this end, we rst classied the initial dataset into sub-datasets
based on the chemical bond information of the central atom. For example, the
central H atom can be classied into two different types: a single H atom (H0) and
an H atom formed a single chemical bond with another atom (H1).
Further treatment is still needed for large sub-datasets. For a given large sub-
dataset, we rst expressed each molecular cluster it contains as a Coulomb
Cij ¼
where Ziand Zjare nuclear charges of atom iand j,Riand Rjare their Cartesian
coordinates. The minimum image convention69 was used to consider the
periodic boundary condition. Invisible atomswere introduced to x the
dimension of the Coulomb matrix. These invisible atoms do not inuence the
physics of the molecule of interest and make the total number of atoms in the
molecule sum to a constant. To lower the dimension of the dataset and keep as
much structural information as possible, the Coulomb matrix was further
represented by the eigen-spectrum, which is obtained by solving the eigenvalue
problem Cv ¼λvunder the constraint λiλiþ1. The clustering algorithm Mini
Batch KMeans70 was then used to cluster the given sub-datasets into smaller
clusters according to the eigen-spectrum. Then we randomly selected
10,000 structures from each cluster (If the cluster contains no more than
10,000 structures, then all of them were selected).
Large amplitude collisions and reactions in the combustion can produce a lot of
unpredictable species and intermediates. To ensure the completeness of the
reference dataset, an active learning approach71 was used. Four different NN PES
models were trained based on the dataset from the last step. Then several short MD
simulations were performed based on these NN models. During the simulation, the
atomic forces are evaluated by these four NN PES models simultaneously. For a
Molecular system
Sampling with
(Coulomb matrixes and
Mini Batch Kmeans)
QM calculation NN training
(with NN-based short MDs)
New data?
Final dataset
Active learning
Fig. 5 The workow of reference dataset construction. The process and steps used in this study to generate the reference dataset needed for neural
network training to generate the potential energy for MD simulation.
6NATURE COMMUNICATIONS | (2020) 11:5713 | /s41467-020-19497-z |
Content courtesy of Springer Nature, terms of use apply. Rights reserved
specic atom, if the predicted forces by these four models are consistent with each
other, then the molecular cluster that centered on this atom should be found in the
dataset. On the contrary, if the results of these four models are inconsistent with
each other and the error between them is in a specic range (0.5 eV/Å < error < 1.0
eV/Å in this work), the corresponding molecular cluster will be added into the
dataset. The update of the dataset will be continued until the predictions of the four
models are always consistent.
QM calculation. The potential energy and atomic forces for every structure in the
nal dataset were calculated by Gaussian 1672 software at the MN15/6-31G** level.
The MN15 functional was employed because it has broad accuracy for multi-
reference and single-reference systems73. To consider the spin polarization effect,
the initial wave function of a given structure is obtained by the combination of the
wave functions of individual molecular species forming the structure, while the
wave function of each molecular species was calculated based on its own charge
and spin.
Training of the NN PES. The scheme of the NN model is shown in Fig. 6. The total
energy Eof a given structure is decomposed into a sum of atomic energy
contributions19,74, i.e., E¼PiEi, where iis the index of the atom. Each atomic
energy is fully determined by the position of the ith atom and its near neighbors.
To guarantee the translational, rotational, and permutational symmetries lying in
the PES, the Cartesian coordinates of atomics are mapped to specic mathematical
formulas called descriptorsof the atomic chemical environment.
The DeepPot-SE (Deep Potential-Smooth Edition) model47 was used to train
the NN potential by the DeePMD-kit program74. Details of this method can be
found in ref. 67. The model includes two networks: the embedding network and the
tting network. Both networks use the ResNet architecture75. The size of the
embedding network was set to (25, 50, 100) and the size of the embedding matrix
was set to 12. The size of the tting network is set to (240, 240, 240). The cutoff
radius was set to 6.0 Å and the descriptors decay smoothly from 1.0 to 6.0 Å. The
initial learning rate was set to 0.0005 and it will decay every 20,000 steps. The loss is
dened by
where ΔEand ΔFiare root mean square errors in energy and force. The prefactor
peis set to 0.2 eV2and the pfdecays from 1000 Å2eV2to 1 Å2eV2.
Data availability
The datasets (structures, potential energies and atomic forces of molecular species)
generated during the current study are available at
NNREAX, Source data are provided with
this paper.
Code availability
The codes used to generate the datasets in the current study are available at https://,
Received: 10 March 2020; Accepted: 6 October 2020;
1. Martinez, T. J. Ab initio reactive computer aided molecular design. Acc. Chem.
Res. 50, 652656 (2017).
2. Car, R. & Parrinello, M. Unied approach for molecular-dynamics and
density-functional theory. Phys. Rev. Lett. 55, 24712474 (1985).
3. Tuckerman, M. E. Ab initiomolecular dynamics: basic concepts, current trends
and novel applications. J. Phys. Condens. Matter 14, R1297R1355 (2002).
4. Wang, L.-P. et al. Discovering chemistry with an ab initio nanoreactor. Nat.
Chem. 6, 1044 (2014).
5. Van Duin, A. C., Dasgupta, S., Lorant, F. & Goddard, W. A. ReaxFF: a reactive
force eld for hydrocarbons. J. Phys. Chem. A 105, 93969409 (2001).
6. Brenner, D. W. et al. A second-generation reactive empirical bond order
(REBO) potential energy expression for hydrocarbons. J. Phys. Condens.
Matter 14, 783 (2002).
7. Nouranian, S., Tschopp, M. A., Gwaltney, S. R., Baskes, M. I. & Horstemeyer,
M. F. An interatomic potential for saturated hydrocarbons based on the
modied embedded-atom method. Phys. Chem. Chem. Phys. 16, 62336249
8. Qu, C., Yu, Q. & Bowman, J. M. Permutationally invariant potential energy
surfaces. Annu. Rev. Phys. Chem. 69, 151175 (2018).
9. Li, J. & Guo, H. Permutationally invariant tting of intermolecular potential
energy surfaces: a case study of the Ne-C2H2 system. J. Chem. Phys. 143,
214304 (2015).
10. Braams, B. J. & Bowman, J. M. Permutationally invariant potential energy
surfaces in high dimensionality. Int. Rev. Phys. Chem. 28, 577606 (2009).
11. Nagy, T., Yosa Reyes, J. & Meuwly, M. Multisurface adiabatic reactive
molecular dynamics. J. Chem. Theory Comput. 10, 13661375 (2014).
12. Warshel, A. & Florián, J. in Encyclopedia of Computational Chemistry (John
Wiley and Sons, 2002).
13. Meuwly, M. Reactive molecular dynamics: from small molecules to proteins.
Wires Comput. Mol. Sci. 9, e1386 (2019).
14. Koner, D., Salehi, S. M., Mondal, P. & Meuwly, M. Non-conventional force
elds for applications in spectroscopy and chemical reaction dynamics. J.
Chem. Phys. 153, 010901 (2020).
15. Zheng, M. et al. Pyrolysis of liulin coal simulated by GPU-based ReaxFF MD
with cheminformatics analysis. Energy Fuels 28, 522534 (2014).
16. Wang, E., Ding, J., Qu, Z. & Han, K. Development of a reactive force eld for
hydrocarbons and application to iso-octane thermal decomposition. Energy
Fuels 32, 901907 (2017).
17. Cheng, T., Jaramillo-Botero, A., Goddard, W. A. & Sun, H. Adaptive
accelerated ReaxFF reactive dynamics with validation from simulating
hydrogen combustion. J. Am. Chem. Soc. 136, 94349442 (2014).
18. Bertels, L. W., Newcomb, L. B., Alaghemandi, M., Green, J. R. & Head-
Gordon, M. Benchmarking the performance of the ReaxFF reactive force eld
on hydrogen combustion systems. J. Phys. Chem. A 124, 56315645 (2020).
19. Behler, J. & Parrinello, M. Generalized neural-network representation of high-
dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
20. Behler, J. First principles neural network potentials for reactive simulations of
large molecular and condensed systems. Angew. Chem. Int. 56, 1282812840
21. Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network
potential with DFT accuracy at force eld computational cost. Chem. Sci. 8,
31923203 (2017).
Descriptors Atomic
Fig. 6 The neural network model. The neural network model that generates the potential energy surface for MD simulation.
NATURE COMMUNICATIONS | (2020) 11:5713 | /s41467-020-19497-z | 7
Content courtesy of Springer Nature, terms of use apply. Rights reserved
22. Yao, K., Herr, J. E., Toth, D. W., Mckintyre, R. & Parkhill, J. The TensorMol-
0.1 model chemistry: a neural network augmented with long-range physics.
Chem. Sci. 9, 22612269 (2018).
23. Lee, K., Yoo, D., Jeong, W. & Han, S. SIMPLE-NN: an efcient package for
training and executing neural-network interatomic potentials. Comput. Phys.
Commun. 242,95103 (2019).
24. Chen, X., Jørgensen, M. S., Li, J. & Hammer, B. Atomic energies from a
convolutional neural network. J. Chem. Theory Comput. 14, 39333942
25. Zhang, Y., Hu, C. & Jiang, B. Embedded atom neural network potentials:
efcient and accurate machine learning with a physically inspired
representation. J. Phys. Chem. Lett. 10, 49624967 (2019).
26. Chmiela, S. et al. Machine learning of accurate energy-conserving molecular
force elds. Sci. Adv. 3, e1603015 (2017).
27. Schutt, K. T., Arbabzadah, F., Chmiela, S., Muller, K. R. & Tkatchenko, A.
Quantum-chemical insights from deep tensor neural networks. Nat. Commun.
8, 13890 (2017).
28. Sauceda, H. E., Chmiela, S., Poltavsky, I., Muller, K. R. & Tkatchenko, A.
Molecular force elds with gradient-domain machine learning: construction
and application to dynamics of small molecules with coupled cluster forces. J.
Chem. Phys. 150, 114102 (2019).
29. Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-
R. SchNeta deep learning architecture for molecules and materials. J. Chem.
Phys. 148, 241722 (2018).
30. Unke, O. T. & Meuwly, M. PhysNet: a neural network for predicting energies,
forces, dipole moments, and partial charges. J. Chem. Theory Comput. 15,
36783693 (2019).
31. Christensen, A. S., Bratholm, L. A., Faber, F. A. & Anatole von Lilienfeld, O.
FCHL revisited: faster and more accurate quantum machine learning. J. Chem.
Phys. 152, 044107 (2020).
32. Lu, X., Meng, Q., Wang, X., Fu, B. & Zhang, D. H. Rate coefcients of the H+
reaction on an accurate fundamental invariant-neural
network potential energy surface. J. Chem. Phys. 149, 174303 (2018).
33. Yin, Z., Guan, Y., Fu, B. & Zhang, D. H. Two-state diabatic potential energy
surfaces of ClH 2 based on nonadiabatic couplings with neural networks. Phys.
Chem. Chem. Phys. 21, 2037220383 (2019).
34. Zhang, Y., Zhou, X. & Jiang, B. Bridging the gap between direct dynamics and
globally accurate reactive potential energy surfaces using neural networks. J.
Phys. Chem. Lett. 10, 11851191 (2019).
35. Chen, J., Xu, X., Xu, X. & Zhang, D. H. Communication: An accurate global
potential energy surface for the OH plus CO -> H +CO
reaction using
neural networks. J. Chem. Phys. 138, 221104 (2013).
36. Huang, S. D., Shang, C., Kang, P. L., Zhang, X. J. & Liu, Z. P. LASP: fast global
potential energy surface exploration. Wiley Interdisci. Rev. Comput. Mol 9,
e1415 (2019).
37. Kang, P. L., Shang, C. & Liuo, Z. P. Glucose to 5-hydroxymethylfurfural:
origin of site-selectivity resolved by machine learning based reaction sampling.
J. Am. Chem. Soc. 141, 2052520536 (2019).
38. Brickel, S., Das, A. K., Unke, O. T., Turan, H. T. & Meuwly, M. Reactive
molecular dynamics for the [ClCH3Br]reaction in the gas phase and in
solution: a comparative study using empirical and neural network force elds.
Electron. Struct. 1, 024002 (2019).
39. Zhang, L., Han, J., Wang, H., Car, R. & Weinan, E. Deep potential molecular
dynamics: a scalable model with the accuracy of quantum mechanics. Phys.
Rev. Lett. 120, 143001 (2018).
40. Han, J. Q., Zhang, L. F., Car, R. & Weinan, E. Deep potential: a general
representation of a many-body potential energy surface. Commun. Comput.
Phys. 23, 629639 (2018).
41. Jia, W. et al. Pushing the limit of molecular dynamics with ab initio accuracy
to 100 million atoms with machine learning. Preprint at
2005.00223 (2020).
42. Blum, L. C. & Reymond, J.-L. 970 million druglike small molecules for virtual
screening in the chemical universe database GDB-13. J. Am. Chem. Soc. 131,
87328733 (2009).
43. Ruddigkeit, L., Van Deursen, R., Blum, L. C. & Reymond, J.-L. Enumeration of
166 billion organic small molecules in the chemical universe database GDB-
17. J. Chem. Inf. Model. 52, 28642875 (2012).
44. Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1, a data set of 20 million
calculated off-equilibrium conformations for organic molecules. Sci. Data 4,
170193 (2017).
45. Smith, J. S. et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and
density functional theory properties for molecules. Sci. Data 7, 134 (2020).
46. He, Z., Li, X.-B., Liu, L.-M. & Zhu, W. The intrinsic mechanism of methane
oxidation under explosion condition: a combined ReaxFF and DFT study. Fuel
124,8590 (2014).
47. Zhang, L. et al. End-to-end symmetry preserving inter-atomic potential energy
model for nite and extended systems. In: Bengio, S. et al. (eds) Advances in
Neural Information Processing Systems 31, 44364446 (Curran Associates Inc,
48. Smithy, G. P. et al. GRI_Mech 30.
49. Reid, I. A. B., Robinson, C. & Smith, D. B. Spontaneous ignition of methane:
Measurement and chemical model. Symp. Int. Combust. Proc. 20, 18331843
50. Wu, Y. Z., Sun, H., Wu, L. & Deetz, J. D. Extracting the mechanisms and
kinetic models of complex reactions from atomistic simulation data. J.
Comput. Chem. 40, 15861592 (2019).
51. Dontgen, M. et al. Automated discovery of reaction pathways, rate constants,
and transition states using reactive molecular dynamics simulations. J. Chem.
Theory Comput. 11, 25172524 (2015).
52. Zhang, Y. et al. DP-GEN: a concurrent learning platform for the generation of
reliable deep learning based potential energy models. Comput. Phys. Commun.
253, 107206 (2020).
53. Ju, Y. & Sun, W. Plasma assisted combustion: dynamics and chemistry. Prog.
Energy Combust. Sci. 48,2183 (2015).
54. Chen, W.-K., Liu, X.-Y., Fang, W.-H., Dral, P. O. & Cui, G. Deep learning
for nonadiabatic excited-state dynamics. J. Phys. Chem. Lett. 9, 67026708
55. Hu, D., Xie, Y., Li, X., Li, L. & Lan, Z. Inclusion of machine learning kernel
ridge regression potential energy surfaces in on-the-y nonadiabatic
molecular dynamics simulation. J. Phys. Chem. Lett. 9, 27252732 (2018).
56. Westermayr, J. et al. Machine learning enables long time scale molecular
photodynamics simulations. Chem. Sci. 10, 81008107 (2019).
57. Westermayr, J., Faber, F. A., Christensen, A. S., von Lilienfeld, O. A. &
Marquetand, P. Neural networks and kernel ridge regression for excited states
dynamics of CH2NH2+: from single-state to multi-state representations and
multi-property machine learning models. Mach. Learn.: Sci. Technol. 1,
025009 (2020).
58. Borges, Y. G., Galvão, B. R. L., Mota, V. C. & Varandas, A. J. C. A trajectory
surface hopping study of N2A3Σu+quenching by H atoms. Chem. Phys. Lett.
729,6164 (2019).
59. Schinke, R., Grebenshchikov, S. Y., Ivanov, M. V. & Fleurat-Lessard, P.
Dynamical studies of the ozone isotope effect: a status report. Annu. Rev. Phys.
Chem. 57, 625661 (2006).
60. Pezzella, M., Koner, D. & Meuwly, M. Formation and stabilization of ground
and excited-state singlet O2 upon recombination of (3)P oxygen on
amorphous solid water. J. Phys. Chem. Lett. 11, 21712176 (2020).
61. Koner, D., Bemish, R. J. & Meuwly, M. The C((3)P) +NO(X(2)Pi)> O((3)P)
+CN(X(2)Sigma(+)), N((2)D)/N((4)S) +CO(X(1)Sigma(+)) reaction: rates,
branching ratios, and nal states from 15 K to 20 000 K. J. Chem. Phys. 149,
094305 (2018).
62. Koner, D., Unke, O. T., Boe, K., Bemish, R. J. & Meuwly, M. Exhaustive state-
to-state cross sections for reactive molecular collisions from importance
sampling simulation and a neural network representation. J. Chem. Phys. 150,
211101 (2019).
63. BOVIA, Materials Studio 2017
resource-center/citations-and-references/ (Dassault Systèmes, San Diego,
64. Aktulga, H. M., Fogarty, J. C., Pandit, S. A. & Grama, A. Y. Parallel reactive
molecular dynamics: numerical methods and algorithmic techniques. Parallel
Comput. 38, 245259 (2012).
65. Chenoweth, K., Van Duin, A. C. & Goddard, W. A. ReaxFF reactive force eld
for molecular dynamics simulations of hydrocarbon oxidation. J. Phys. Chem.
A112, 10401053 (2008).
66. OBoyle, N. M. et al. Open Babel: an open chemical toolbox. J.
Cheminformatics 3, 33 (2011).
67. Tarjan, R. Depth-rst search and linear graph algorithms. SIAM J. Comput. 1,
146160 (1972).
68. Rupp, M., Tkatchenko, A., Müller, K.-R. & Von Lilienfeld, O. A. Fast and
accurate modeling of molecular atomization energies with machine learning.
Phys. Rev. Lett. 108, 058301 (2012).
69. Hloucha, M. & Deiters, U. Fast coding of the minimum image convention.
MoSim 20, 239244 (1998).
70. Sculley, D. Web-scale k-means clustering. In: Rappa, M. et al. (eds) Proc. 19th
International Conference on World Wide Web (ACM, 2010).
71. Zhang, L., Lin, D.-Y., Wang, H., Car, R. & Weinan, E. Active learning of
uniformly accurate interatomic potentials for materials simulation. Phys. Rev.
Mat. 3, 023804 (2019).
72. Frisch, M. et al. Gaussian 16, revision A. 03 (Gaussian Inc, Wallingford CT,
73. Haoyu, S. Y., He, X., Li, S. L. & Truhlar, D. G. MN15: A KohnSham global-
hybrid exchangecorrelation density functional with broad accuracy for multi-
reference and single-reference systems and noncovalent interactions. Chem.
Sci. 7, 50325051 (2016).
8NATURE COMMUNICATIONS | (2020) 11:5713 | /s41467-020-19497-z |
Content courtesy of Springer Nature, terms of use apply. Rights reserved
74. Wang, H., Zhang, L., Han, J. & Weinan, E. DeePMD-kit: a deep learning
package for many-body potential energy representation and molecular
dynamics. Comput. Phys. Commun. 228, 178184 (2018).
75. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image
recognition. In: Tuytelaars, T. et al. (eds) Proc. IEEE Conference on Computer
Vision and Pattern Recognition (IEEE, 2016).
The authors thank Dr. Linfeng Zhang and Dr. Han Wang for their discussion and help in
using DeepPot-SE and DeePMD-kit. T.Z. would also like to thank Prof. Donghui Zhang
for his valuable suggestions in this project. This work was supported by the National Key
R&D Program of China (grant no. 2016YFA0501700), the National Natural Science
Foundation of China (grant nos. 91641116, 91753103, and 21933010), and the Innova-
tion Program of Shanghai Municipal Education Commission (201701070005E00020).
J. Zeng was partially supported by the National Innovation and Entrepreneurship
Training Program for Undergraduate (201910269080). We also thank the ECNU
Multifunctional Platform for Innovation (No. 001) for providing supercomputer time.
Author contributions
J.Z. trained the neural network potential and performed most of the QM calculations.
L.C. and M.X. analyzed the trajectory and performed part of the QM calculation. T.Z.
and J.Z.H.Z. conceived the project and wrote the manuscript with input from all authors.
Competing interests
The authors declare no competing interests.
Additional information
Supplementary information is available for this paper at
Correspondence and requests for materials should be addressed to T.Z. or J.Z.H.Z.
Peer review information Nature Communications thanks the anonymous reviewers for
their contribution to the peer review of this work. Peer reviewer reports are available.
Reprints and permission information is available at
Publishers note Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional afliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license, and indicate if changes were made. The images or other third party
material in this article are included in the articles Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not included in the
articles Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this license, visit
© The Author(s) 2020
NATURE COMMUNICATIONS | (2020) 11:5713 | /s41467-020-19497-z | 9
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
... Machine learning potentials offer a potential mechanism to improve the accuracy and efficiency of QM/MM simulations, and they have had considerable impact in the development of methods to study chemical reactions. [15][16][17][18][19][20][21] Herein we develop an approach whereby we employ a recently described deep-potential range correction (DPRc) model 22 to enhance the accuracy of a fast, approximate base QM/MM model to reproduce the energies and forces of a much more computationally costly target QM/MM model. The new model parametrizes the DPRc potential using a machine learning neural network training procedure 22 to correct the 2nd-order density-functional tight-binding (DFTB2) semiempirical method [23][24][25] to reproduce the PBE0/6-31G* energies and forces in explicit solvent QM/MM calculations. ...
... Recent work has attempted to address this deficiency through graph-based methods that generate reference data for reactive systems 13,14 , but they are also prone to produce large numbers of specious chemical states and unrealistic intermediates such as highly unstable radicals. Therefore fully ab initio sampling methods are a necessity for creation of the many molecular fragments involved in combustion chemistry, including the presence of stable and unstable intermediates, high energy transition states, and a variety of product molecules that can be formed during the reaction that is dependent on the reactive channel 8,9,[15][16][17][18] . ...
Full-text available
The generation of reference data for deep learning models is challenging for reactive systems, and more so for combustion reactions due to the extreme conditions that create radical species and alternative spin states during the combustion process. Here, we extend intrinsic reaction coordinate (IRC) calculations with ab initio MD simulations and normal mode displacement calculations to more extensively cover the potential energy surface for 19 reaction channels for hydrogen combustion. A total of ∼290,000 potential energies and ∼1,270,000 nuclear force vectors are evaluated with a high quality range-separated hybrid density functional, ωB97X-V, to construct the reference data set, including transition state ensembles, for the deep learning models to study hydrogen combustion reaction. Measurement(s)ab initio energies and forces of hydrogen combustionTechnology Type(s)density functional theory • ab initio molecular dynamics • normal modesFactor Type(s)cartesian coordinates Measurement(s) ab initio energies and forces of hydrogen combustion Technology Type(s) density functional theory • ab initio molecular dynamics • normal modes Factor Type(s) cartesian coordinates
... Researchers nowadays widely use vN computers to run MD, largely because they have no other choice. Though some special-purpose MD computers have been developed 22,23,40,41 , they are all based on CMD and FF, whose accuracy is questionable in many important applications [42][43][44][45][46] . Therefore, considering the scientific and technological significance of MD , it deserves serious efforts to develop a special-purpose MD computer beyond the vN paradigm, to enable efficient and accurate MD calculations in various fields. ...
Full-text available
Force field-based classical molecular dynamics (CMD) is efficient but its potential energy surface (PES) prediction error can be very large. Density functional theory (DFT)-based ab-initio molecular dynamics (AIMD) is accurate but computational cost limits its applications to small systems. Here, we propose a molecular dynamics (MD) methodology which can simultaneously achieve both AIMD-level high accuracy and CMD-level high efficiency. The high accuracy is achieved by exploiting deep neural network (DNN)’s arbitrarily-high precision to fit PES. The high efficiency is achieved by deploying multiplication-less DNN on a carefully-optimized special-purpose non von Neumann (NvN) computer to mitigate the performance-limiting data shuttling (i.e., ‘memory wall bottleneck’). By testing on different molecules and bulk systems, we show that the proposed MD methodology is generally-applicable to various MD tasks. The proposed MD methodology has been deployed on an in-house computing server based on reconfigurable field programmable gate array (FPGA), which is freely available at .
... 19 MD sampling is the most straightforward way to construct the training data set, and the evolution of energies and forces is recorded as training data sets. 20 Various bond dissociations and recombinations exist in a reactive system, e.g., the decomposition of HE materials. Such processes cannot be sampled appropriately in classical MD sampling due to the high energy barriers. ...
Full-text available
Ab initio molecular dynamics (AIMD) is an established method for revealing the reactive dynamics of complex systems. However, the high computational cost of AIMD restricts the explorable length and time scales. Here, we develop a fundamentally different approach using molecular dynamics simulations powered by a neural network potential to investigate complex reaction networks. This potential is trained via a workflow combining AIMD and interactive molecular dynamics in virtual reality to accelerate the sampling of rare reactive processes. A panoramic visualization of the complex reaction networks for decomposition of a novel high explosive (ICM-102) is achieved without any predefined reaction coordinates. The study leads to the discovery of new pathways that would be difficult to uncover if established methods were employed. These results highlight the power of neural network-based molecular dynamics simulations in exploring complex reaction mechanisms under extreme conditions at the ab initio level, pushing the limit of theoretical and computational chemistry toward the realism and fidelity of experiments.
... Many different NNPs have been proposed for water, small organic molecules, and metal materials since Behler and Parrinello proposed the high-dimensional neural network (HDNN) approach. [25][26][27][28][29][30][31][32][33][34][35][36] Recently, Yoo et al. used ReaxFF combined with NNPs to explore the decomposition process of RDX. 37 In this work, accurate NNPs for the pure CL-20 and CL-20/ TNT co-crystal systems will be constructed. ...
CL-20 (2,4,6,8,10,12-hexanitro-2,4,6,8,10,12-hexaazaisowurtzitane, also known as HNIW) is one of the most powerful energetic materials. However, its high sensitivity to environmental stimuli greatly reduces its safety and severely limits its application. In this work, ab initio based neural network potential (NNP) energy surfaces for both β-CL-20 and CL-20/TNT co-crystals were constructed. To accurately simulate the thermal decomposition processes of these two crystal systems, reactive molecular dynamics simulations based on the NNPs were performed. Many important intermediate species and their associated reaction paths during the decomposition had been identified in the simulations and the direct results on detonation temperatures of both systems were provided. The simulations also showed clearly that 2,4,6-trinitrotoluene (TNT) molecules in the co-crystal act as a buffer to slow down the chain reactions triggered by nitrogen dioxide and this effect is more significant at lower temperatures. Specifically, the addition of TNT molecules in the CL-20/TNT co-crystal introduces intermolecular hydrogen bonds between CL-20 and TNT molecules in the system, thereby increasing the thermal stability of the co-crystal. The current reactive molecular dynamics simulation is performed based on the NNP which helps in accelerating the speed of ab initio molecular dynamics (AIMD) simulation by more than 3 orders of magnitude while preserving the accuracy of density functional theory (DFT) calculations. This enabled us to perform longer-time simulations at more realistic temperatures that traditional AIMD methods cannot achieve. With the advantage of the NNP in its powerful fitting ability and transferability, the NNP-based MD simulation can be widely applied to energetic material systems.
Whether the air-water interface decreases or increases the acidity of simple organic and inorganic acids compared to the bulk is critically important in a broad range of environmental and biochemical processes. However, a consensus has not yet been achieved on this key question. Here we use machine learning-based reactive molecular dynamics simulations to study the dissociation of paradigmatic nitric and formic acids at the air-water interface. We show that the local acidity profile across the interface is determined by changes in acid and conjugate base solvation and that the acidity decreases abruptly over a transition region of a few molecular layers. At the interface, both acids are weaker than in the bulk due to desolvation. In contrast, acidities below the interface reach a plateau and are all the stronger compared to those in the bulk as the surface to volume ratio of the aqueous phase is large, due to the growing impact of the stabilization of the released proton at the surface of the water. These results imply that the measured degree of dissociation sensitively depends on the experimental probing length and system size and suggest a molecular explanation for the contrasting experimental results. The aerosol size dependence of acidity has important consequences for atmospheric chemistry.
Molecular simulations based on reactive force-fields (ReaxFF) have been applied as a powerful tool for exploring the dynamics evolution of complex carbonaceous materials. A comprehensive review of the thermal degradation reactions of renewable and nonrenewable carbon precursors at different conditions is presented, aiming at gaining molecular insights on mitigating heavy carbonaceous deposition in undesirable scenarios using molecular simulation tools, while providing some perspectives and future directions on the subject. The review is divided in three main interconnected areas: (i) overview on the implementation of ReaxFF simulations, structural extraction techniques and microstructural characteristic properties of carbon-based materials, followed by (ii) proposed biomass, bio-oil, and bio-fuel reaction mechanisms from which, the tendency of coke and char formation is tackled. Finally, (iii) understanding nonrenewable coal, soot, coal char, and petroleum derivatives (petcoke) carbonaceous materials reactivity under high-temperature thermochemical reactions. A critical discussion is presented on the effects of temperature and functional groups on the structural evolution of large-scale atomistic structures, initial ring cleavage reactions, along with the generated products yields and characteristics. Suggested improvements in the ReaxFF implementation methodology and parametrization approach are made, followed by future directions on incorporating catalytic surfaces for tackling bio-oil upgrading in regards to coke formation and deposition.
Knowledge of the detailed mechanism behind the atomic layer deposition (ALD) can greatly facilitate the optimization of the manufacturing process. Computational modeling can potentially foster the understanding; however, the presently available capabilities of the accurate ab initio computational techniques preclude their application to modeling surface processes occurring on a long time scale, such as ALD. Although the situation can be greatly improved using machine learning (ML), this technique requires an enormous amount of data for training datasets. Here, we propose an iterative protocol for optimizing ML training datasets and apply ML-assisted ab initio calculations to model surface reactions occurring during the Al(Me)3/H2O ALD process on the OH-terminated Si (111) surface. The protocol uses a recently developed low-dimensional projection technique (TDUS), greatly reducing the amount of information required to achieve high accuracy (ca. 1 kcal/mol or less) of the developed ML models. The resulting free energy landscapes reveal fine details of various aspects of the target ALD process, such as the surface proton transfer, zwitterionic surface configurations, elimination-addition/addition-elimination, and SN2 reactions as well as the role of the surface entropic and temperature effects. Simulations of adsorption dynamics predict that the maximum physisorption rate of ca. 70% is achieved at the incidence velocity urms of the reactants in the range of 15-20 Å/ps. Hence, the proposed protocol furnishes a very effective tool to study complex chemical reaction dynamics at a much reduced computational cost.
Machine learning models for the potential energy of multi-atomic systems, such as the deep potential (DP) model, make molecular simulations with the accuracy of quantum mechanical density functional theory possible at a cost only moderately higher than that of empirical force fields. However, the majority of these models lack explicit long-range interactions and fail to describe properties that derive from the Coulombic tail of the forces. To overcome this limitation, we extend the DP model by approximating the long-range electrostatic interaction between ions (nuclei + core electrons) and valence electrons with that of distributions of spherical Gaussian charges located at ionic and electronic sites. The latter are rigorously defined in terms of the centers of the maximally localized Wannier distributions, whose dependence on the local atomic environment is modeled accurately by a deep neural network. In the DP long-range (DPLR) model, the electrostatic energy of the Gaussian charge system is added to short-range interactions that are represented as in the standard DP model. The resulting potential energy surface is smooth and possesses analytical forces and virial. Missing effects in the standard DP scheme are recovered, improving on accuracy and predictive power. By including long-range electrostatics, DPLR correctly extrapolates to large systems the potential energy surface learned from quantum mechanical calculations on smaller systems. We illustrate the approach with three examples: the potential energy profile of the water dimer, the free energy of interaction of a water molecule with a liquid water slab, and the phonon dispersion curves of the NaCl crystal.
Full-text available
Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry.
Full-text available
Excited-state dynamics simulations are a powerful tool to investigate photo-induced reactions of molecules and materials and provide complementary information to experiments. Since the applicability of these simulation techniques is limited by the costs of the underlying electronic structure calculations, we develop and assess different machine learning models for this task. The machine learning models are trained on ab initio calculations for excited electronic states, using the methylenimmonium cation (CH 2 NH 2 + ) as a model system. Two distinct strategies for modeling excited state properties are tested in this work. The first strategy is to treat each state separately in a kernel ridge regression model and all states together in a multiclass neural network. The second strategy is to instead encode the state as input into the model, which is tested with both models. Numerical evidence suggests that using the state as input yields the best performance. An important goal for excited-state machine learning models is their use in dynamics simulations, which needs not only state-specific information but also couplings, i.e. properties involving pairs of states. Accordingly, we investigate how well machine learning models can predict the couplings. Furthermore, we explore how combining all properties in a single neural network affects the accuracy. Finally, machine learning predicted energies, forces, and couplings are used to carry out excited-state dynamics simulations. Results demonstrate the scopes and possibilities of machine learning to model excited-state properties.
Full-text available
Glucose pyrolysis, a model system in biomass utilization, is renowned for its great complexity, deep in reaction network hierarchy and rich in reaction patterns. The selectivity in glucose pyrolysis, e.g., the high yield of 5-hydroxymethylfurfural (HMF), a value-added platform product, remains an intriguing puzzle even after 60 years of experimental study. Here we resolve the whole reaction network of glucose pyrolysis using a global-to-global technique for reaction pathway sampling. This is achieved by establishing the first organic chemistry reaction database via stochastic surface walking (SSW) global optimization, building the global neural network (G-NN) potential via machine learning and extensively exploring the reaction network of glucose pyrolysis. In total, 6407 elementary reactions, screened out from more than 150 000 reaction pairs in glucose pyrolysis, are collected in our reaction database. The established reaction network from SSW-NN, further validated by first-principles calculations, reveals that for glucose to HMF, the lowest energy reaction pathway involves fructose and 3-deoxyglucos-2-ene (3-DGE) as key intermediates and a site-selective reaction type, retro-Michael-addition, for three consecutive dehydration steps. The overall barrier is determined to be 1.91 eV, being at least 0.19 eV lower than all previously proposed mechanisms, which assumes direct β-H elimination dehydration. The lowest pathways to the other two major products, furfural (FF) and hydroxyacetaldehyde (HAA), are also discovered with a similar barrier 1.95 eV, which exhibit a competing nature by sharing the same key intermediate, 3-ketohexose. Since chemical reactions occurring in fast glucose pyrolysis are generally present in biomass chemistry, containing essentially all reaction patterns of C-H-O elements, the methodology designed and the results presented would help to advance reaction design and mechanistic modeling in renewable fuels from biomass.
Full-text available
A general neural networks (NNs) fitting procedure based on nonadiabatic couplings is proposed to generate coupled two-state diabatic potential energy surfaces(PESs) with conical intersections. The elements of the diabatic potential energy matrix(DPEM) can be obtained directly from a combination of the NN outputs in principle. Instead, to achieve the higher accuracy, the adiabaticto- diabatic transformation (ADT) angle (mixing angle) for each geometry is first solved from the NN outputs, followed by individual NN fittings of the three terms of DPEM, which are calculated from the ab initio adiabatic energies and solved mixing angles. The procedure is applied to construct a new set of two-state diabatic potential energy surfaces of ClH2. The ab initio data including adiabatic energies and derivative couplings are well reproduced. Furthermore, the current diabatization procedure can describe well the vicinity of conical intersections at high potential energy regions, which is located in T-shaped(C2v) structure of Cl-H2. The diabatic quantum dynamical results on diabatic PESs show large differences as compared with the adiabatic results at high collision energy regions, suggesting the significance of nonadiabatic processes in conical intersection regions at high energies.
Extensions and improvements of empirical force fields are discussed in view of applications to computational vibrational spectroscopy and reactive molecular dynamics simulations. Particular focus is on quantitative studies, which make contact with experiments and provide complementary information for a molecular-level understanding of processes in the gas phase and in solution. Methods range from including multipolar charge distributions to reproducing kernel Hilbert space approaches and machine learned energy functions based on neural networks.
A thorough understanding of the kinetics and dynamics of combusting mixtures is of considerable interest, especially in regimes beyond the reach of current experimental validation. The ReaxFF reactive force field method has provided a way to simulate large-scale systems of hydrogen combustion via a parameterized potential that can simulate bond breaking. This modeling approach has been applied to hydrogen combustion, as well as myriad other reactive chemical systems. In this work, we benchmark the performance of several common parameterizations of this potential against higher-level quantum mechanical (QM) approaches. We demonstrate instances where these parameterizations of the ReaxFF potential fail both quantitatively and qualitatively to describe reactive events relevant for hydrogen combustion systems.
The recombination dynamics of 3P oxygen atoms on cold amorphous solid water to form triplet and singlet molecular oxygen (O2) is investigated under conditions representative of cold clouds. Reactive molecular dynamics simulations including Landau-Zener-based hopping to account for nonadiabatic transitions find that both ground-state (X3Σ g -) O2 and molecular oxygen in the two lowest singlet states (a1Δ g and b1Σ g +) can be formed and the molecular species stabilize through vibrational relaxation. The relative populations of the species are approximately 1:1:1. These results also agree qualitatively with a kinetic model based on simplified wavepacket simulations. The presence and stabilization of higher electronic states of O2 are expected to modify the chemical evolution of cold interstellar (T ∼ 10-50 K) and warmer noctilucent (T ∼ 100 K) clouds.
In recent years, promising deep learning based interatomic potential energy surface (PES) models have been proposed that can potentially allow us to perform molecular dynamics simulations for large scale systems with quantum accuracy. However, making these models truly reliable and practically useful is still a very non-trivial task. A key component in this task is the generation of datasets used in model training. In this paper, we introduce the Deep Potential GENerator (DP-GEN), an open-source software platform that implements the recently proposed ”on-the-fly” learning procedure (Zhang et al. 2019) and is capable of generating uniformly accurate deep learning based PES models in a way that minimizes human intervention and the computational cost for data generation and model training. DP-GEN automatically and iteratively performs three steps: exploration, labeling, and training. It supports various popular packages for these three steps: LAMMPS for exploration, Quantum Espresso, VASP, CP2K, etc. for labeling, and DeePMD-kit for training. It also allows automatic job submission and result collection on different types of machines, such as high performance clusters and cloud machines, and is adaptive to different job management tools, including Slurm, PBS, and LSF. As a concrete example, we illustrate the details of the process for generating a general-purpose PES model for Cu using DP-GEN. Program summary Program Title: DP-GEN Program Files doi: Licensing provisions: LGPL Programming language: Python Nature of problem: Generating reliable deep learning based potential energy models with minimal human intervention and computational cost. Solution method: The concurrent learning scheme is implemented. Supports for sampling configuration space with LAMMPS, generating ab initio data with Quantum Espresso, VASP, CP2K and training potential models with DeePMD-kit are provided. Supports for different machines including workstations, high performance clusters and cloud machines are provided. Supports for job management tools including Slurm, PBS, LSF are provided.
We introduce the FCHL19 representation for atomic environments in molecules or condensed-phase systems. Machine learning models based on FCHL19 are able to yield predictions of atomic forces and energies of query compounds with chemical accuracy on the scale of milliseconds. FCHL19 is a revision of our previous work [F. A. Faber et al., J. Chem. Phys. 148, 241717 (2018)] where the representation is discretized and the individual features are rigorously optimized using Monte Carlo optimization. Combined with a Gaussian kernel function that incorporates elemental screening, chemical accuracy is reached for energy learning on the QM7b and QM9 datasets after training for minutes and hours, respectively. The model also shows good performance for non-bonded interactions in the condensed phase for a set of water clusters with a mean absolute error (MAE) binding energy error of less than 0.1 kcal/mol/molecule after training on 3200 samples. For force learning on the MD17 dataset, our optimized model similarly displays state-of-the-art accuracy with a regressor based on Gaussian process regression. When the revised FCHL19 representation is combined with the operator quantum machine learning regressor, forces and energies can be predicted in only a few milliseconds per atom. The model presented herein is fast and lightweight enough for use in general chemistry problems as well as molecular dynamics simulations.