ArticlePDF Available

Abstract and Figures

Combustion is a complex chemical system which involves thousands of chemical reactions and generates hundreds of molecular species and radicals during the process. In this work, a neural network-based molecular dynamics (MD) simulation is carried out to simulate the benchmark combustion of methane. During MD simulation, detailed reaction processes leading to the creation of specific molecular species including various intermediate radicals and the products are intimately revealed and characterized. Overall, a total of 798 different chemical reactions were recorded and some new chemical reaction pathways were discovered. We believe that the present work heralds the dawn of a new era in which neural network-based reactive MD simulation can be practically applied to simulating important complex reaction systems at ab initio level, which provides atomic-level understanding of chemical reaction processes as well as discovery of new reaction pathways at an unprecedented level of detail beyond what laboratory experiments could accomplish.
Content may be subject to copyright.
ARTICLE
Complex reaction processes in combustion
unraveled by neural network-based molecular
dynamics simulation
Jinzhe Zeng 1, Liqun Cao 1, Mingyuan Xu1, Tong Zhu 1,2 & John Z. H. Zhang 1,2,3,4
Combustion is a complex chemical system which involves thousands of chemical reactions
and generates hundreds of molecular species and radicals during the process. In this work, a
neural network-based molecular dynamics (MD) simulation is carried out to simulate the
benchmark combustion of methane. During MD simulation, detailed reaction processes
leading to the creation of specic molecular species including various intermediate radicals
and the products are intimately revealed and characterized. Overall, a total of 798 different
chemical reactions were recorded and some new chemical reaction pathways were dis-
covered. We believe that the present work heralds the dawn of a new era in which neural
network-based reactive MD simulation can be practically applied to simulating important
complex reaction systems at ab initio level, which provides atomic-level understanding of
chemical reaction processes as well as discovery of new reaction pathways at an unprece-
dented level of detail beyond what laboratory experiments could accomplish.
https://doi.org/10.1038/s41467-020-19497-z OPEN
1Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China
Normal University, Shanghai 200062, China. 2NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China. 3Department of
Chemistry, New York University, New York, NY 10003, USA. 4Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi
030006, China. email: tzhu@lps.ecnu.edu.cn;john.zhang@nyu.edu
NATURE COMMUNICATIONS | (2020) 11:5713 | https://doi.org/10.1038 /s41467-020-19497-z | www.nature.com/naturecommunications 1
1234567890():,;
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Ever since learning to use re, human beings have never
stopped studying combustion. With increasingly serious
concern on environmental pollution from combustion,
understanding and mastering the combustion mechanisms is of
great importance. Gaining fundamental insights into combustion
processes can help us design more efcient engines and minimize
the production of pollutants. A typical combustion may contain
hundreds of chemical species and thousands of fundamental
chemical reactions. In particular, combustion occurs at extreme
physical conditions with high pressures and high temperatures up
to several thousand degrees. Also, many elementary reactions in a
combustion typically occur on sub picosecond time scale. These
extreme physical conditions make it very difcult, if not impos-
sible, to carry out real-time experimental study of combustion.
Thus, most experimental investigations of chemical reaction
mechanisms focus on individual reactions instead of the complex
reaction processes occurring in a combustion. In the past decades,
in slico experiments such as reactive molecular dynamics (MD)
simulations have shown their values in providing molecular
(atomic)-level insights into the mechanism of combustions. In a
reactive MD simulation, the reaction condition can be easily
controlled in the simulation and some supercritical conditions
that are difcult to achieve in the experiment can also be handled.
Compared with the traditional theoretical approaches such as
transition sate theory and quantum collision theory that focuses
on studying a single reaction, reactive MD simulation can con-
struct the entire interwoven reaction network of a combustion
system1. The heart of the reactive MD simulation is the potential
energy surface (PES), which describes the inter- and intra-
molecular interactions for molecules. Currently, there are mainly
two classes of methods that can be used to construct the PES of a
given molecular system: the quantum mechanics (QM)-based
methods and the empirical force elds. Quantum mechanics is
undoubtedly more rigorous and accurate, and MD simulations
based on it are known as ab initio MD simulation (AIMD)2,3.
Although the AIMD method in principle can simulate complex
chemical reactions in real time, it is limited to relatively small
systems and short simulation time (typically, dozens of picose-
conds) due to exorbitant computational costs of on-the-yab
initio calculation. With the rapid development of computer
hardware and algorithms, especially the employment of graphic
processing units (GPUs), some AIMD methods have recently
begun to handle larger chemical systems4. But so far, it is still
impractical to use AIMD to simulate large-scale complex reaction
systems such as combustions. Over the past decades, many
reactive force elds (or PESs) have been developed and success-
fully used for various reactive molecular systems512. A com-
prehensive discussion of these reactive force elds can be found in
refs. 13,14. Among these force elds, the empirical ReaxFF was
widely used in MD simulation of combustion systems due to its
computational efciency15, but its accuracy and reliability are of
signicant concern1618. The key points of developing a reaction
force eld are the choice of the functional form and the para-
meterization process, which are complicated and depend on
human intervention.
Recently, more researchers are switching to seek the help of
machine-learning (ML) methods. ML method, especially articial
neural networks (NN), provides the possibility to construct PESs
with the accuracy of the QM method but with an efciency
comparable to that of force elds. Neural networks constitute a
very exible and unbiased class of mathematical functions, which
in principle is able to approximate any real-valued function to
arbitrary accuracy. Since Behler and Parrinello proposed the
high-dimensional neural network approach19,20, several methods
have been developed to implement this approach and many
different kind of NN PESs have been proposed for water, small
organic molecules, and metalloid materials2125. For example, the
sGDML2628, SchNet29, PhysNet30, and FCHL31 methods. NN
potentials have also been employed to study the reaction
mechanisms of chemical systems. By combining high-precision
NN PESs and quantum collision theory, Zhang and Jiangs group
have studied a series of elementary reactions in the gas phase and
on the surface3235. Liu and co-workers developed the LASP
program to study the heterogeneous catalysis with NN PESs36
and built stochastic surface walking (SSW)-NN to explore reac-
tion pathways from glucose to 5-hydroxymethylfurfural37. Brickel
et al. also studied the nucleophilic substitution reaction
[ClCH
3
Br]in water with NN potential38.
In this report, we present an in silico simulation of methane
combustion based on an NN potential derived by training a high-
dimensional NN model from ab initio computed energies. To
achieve high efciency and accuracy, the DeePMD model was
used3941. This NN PES can accurately predict the energy and
atomic forces of reactants, products and reaction intermediates.
Based on this model, a 1-ns reactive MD simulation was per-
formed for a combustion system initially containing 100 methane
and 200 oxygen molecules with a sub-femtosecond time resolu-
tion (Fig. 1). A complete reaction network of the methane com-
bustion can be constructed from the MD trajectory. The
simulation not only produced the main reaction pathways that
are consistent with the experiment but also provided much more
detailed insights about the combustion processes as will be
described in the following.
Results
Accuracy of the NN PES. The performance of the NN potential
highly depends on the quality of the reference datasets. Although
several databases, such as QM742, QM943, ANI-144, and ANI-
1x45, are accessible, they mainly include organic molecules and
are therefore not suitable for this work. Combustion of methane
will generate many molecular fragments and a lot of them are free
radicals46. Therefore, we followed a workow (details are listed in
the Methodssection) to construct the reference datasets for the
combustion. Then the DeepPot-SE model47 was used to train
the NN PES based on the reference. The predictive power of the
NN model is shown in Supplementary Table 1 and Supplemen-
tary Fig. 1. It is clear that the DFT energies can be accurately
reproduced by the NN model. The mean absolute errors are only
0.04 and 0.14 eV/atom in the training set and the test set,
respectively. As for the atomic forces, the predicted values of the
NN model are also highly consistent with the calculated results of
the DFT (Supplementary Fig. 1). The correlation coefcient is
0.999 and the MAE is 0.12 eV/Å. Considering that there are a
large number of atomic and molecular collisions during the
combustion process, and some atomic forces can be as high as
dozens of eV/Å, the accuracy of the NN model is encouraging. To
verify the energy conservation of the NN PES, we performed a
reactive MD simulation under the NVE ensemble. The system is a
periodic box containing 100 CH
4
molecules and 200 O
2
mole-
cules (a total of 900 atoms) with a density of 0.25 g/cm3.As
shown in Supplementary Fig. 2, the total energy is conserved in
MD simulation.
The initial stage of combustion. A 1 ns reactive MD simulation
was performed for methane combustion with the NN PES under
the NVT ensemble. The system is also a periodic box containing
100 CH
4
molecules and 200 O
2
molecules (a total of 900 atoms)
with a density of 0.25 g/cm3. The MD simulations were run with a
time-step of 0.1 fs and the temperature was kept at 3000 K by
using the Berendsen thermostat. We chose a relatively high
density (and thus high pressure) and high temperature to
ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-19497-z
2NATURE COMMUNICATIONS | (2020) 11:5713 | https://doi.org/10.1038 /s41467-020-19497-z | www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
enhance the collision probability and sampling efciency, which
is a widely used strategy in reactive MD simulations because the
time scale of the simulation is much shorter than that of
experiments. In fact, experiments usually do not use pure fuel for
combustion, but rather mix the fuel into a relatively inert gas for
safety. In future work, we will try to combine the NN potential
and enhanced sampling algorithms to bring simulated conditions
more realistic.
Figure 1b and Supplementary Fig. 3 show the time-dependent
progression of the main molecular species during the MD
simulation. After 1 ns, about 90 CH
4
and 150 O
2
are consumed
and about 160 H
2
O, 30 CO, and 50 CO
2
are produced. The
potential energy of the system during the simulation is shown in
Supplementary Fig. 4. Although the system has not reached
equilibrium, the important ignition process has already done,
which includes much richer reaction information. In order to
describe the complicated reaction network in more detail, we
divided the combustion process into three stages, namely the
initial stage of the combustion, the production of intermediate
species of formaldehyde and formyl radical, and the production
of CO and CO
2
.
The reaction network in the initial stage of the combustion is
shown in Fig. 2a. The combustion of methane started with the
abstraction of its hydrogen atom by O
2
to generate two radicals
·CH
3
and HOO· (R3). As is seen from Fig. 2b, this process started
at about 32 ps and took about 0.2 ps to nish. During the
simulation, other radicals such as ·OH, ·H, and HOO· also
abstracted hydrogen atom from CH
4
to generate ·CH
3
radical.
Among them, the ·OH radical is the main species who complete
this work and generates water molecules (R1). The atomization of
methane into ·H and ·CH
3
was also observed.
Many ·CH
3
radicals interact with the ·OH radicals to form
methanol (R6) molecules. According to Fig. 2c, this process was
also very quick. Some ·CH
3
interacted with O
2
and HOO· to form
methyldioxidanyl (CH
3
OO·, R4) and methyl-hydroperoxide
(CH
3
OOH, R5). Radicals such as ·OH can also abstract H atoms
from ·CH
3
and produce :CH
2
. Methanol can further react with
·OH and ·H to generate methoxy radicals (CH
3
O·, R10, R11),
H
2
O and H
2
. It can also react with ·H to generate ·CH
2
OH and
H
2
(R12). The CH
3
O· can also be produced by the interaction
between CH
3
OO· or CH
3
OOH with ·H (R8 and R9).
Production of formaldehyde and formyl radicals. Most meth-
oxy radicals generated from the last step were converted to for-
maldehyde mainly through two reaction pathways (Fig. 3a). The
rst one is for methoxy radical to interact with ·OH to form
formaldehyde and H
2
O (R16). As shown in Fig. 3b, this process
took about 0.3 ps. The other pathway is for methoxy radical to
interact with ·H and generate formaldehyde and H
2
(R17). The
·CH
2
OH radicals can also convert to formaldehyde by losing the
hydrogen atom on its hydroxyl group (R14 and R15). If it loses
one H atom on the methylene group, it can generate :CHOH
radicals (R13). In addition, the :CH
2
radicals can interact with
·OH and form formaldehyde and the methylidyne radical (R18
and R19).
The formaldehydes were further converted into the formyl
(·CHO) radicals. The main reaction pathways are hydrogen
abstraction by ·O and ·OH. Figure 3c shows the trajectory of the
reaction CH
2
O+·OH ·CHO +H
2
O. An ·OH radical
approaches the rotating formaldehyde molecule and snatches an
H atom to form a water molecule; the whole process takes about
0.4 ps. In addition, other species such as ·H, O
2
, HOO·, and ·CH
3
also abstracted the hydrogen atom from formaldehyde to form
formyl radicals. The R20 and R23 are two reactions that form
formyl radicals without the participation of formaldehyde.
Production of CO and CO
2
. Formyl radicals can convert to CO
by losing hydrogen in two ways (Fig. 4a). Firstly, it can lose an H
atom directly (R25). Figure 4b shows a real-time trajectory of this
process. A formyl radical lost its H atom at about 405.79 ps, but
this reaction was quickly reversed and the formyl radical was re-
formed. After another 0.4 ps the reaction took place again to form
CO. Secondly, ·OH can also abstract the H atom from the formyl
radical and generate H
2
O and CO (R26).
The formyl radical can combine with the ·OH radical to form
formic acid (R24), which can further lose its H atom to form
·COOH (R27) or HCOO· (R30). These two species can convert to
CO
2
through the reaction with ·OH or ·H (R29 and R31). The
·COOH radical can also interact with ·H and generate CO and
H
2
O (R28). Figure 4c shows the trajectory of reaction CO
+·OH CO
2
+·H (R32). At 815.32 ps, an ·OH radical started to
approach a CO molecule, and at 815.38 ps, an intermediate
b
a0.0 ns 0.2 ns 0.4 ns 0.6 ns 0.8 ns 1.0 ns
0 0.2 0.4 0.6 0.8 1
0
50
100
150
200
O2
CH4
H2O
CO
CO2
Number of molecules
Time (ns)
Fig. 1 Real-time dynamics of methane combustion. a Snapshots of the partial combustion system extracted from the reactive MD simulation of methane
combustion (the time interval is 0.2 ns). The main molecular species of CH
4
,O
2
,H
2
O and CO
2
molecules are colored in cyan, red, blue and black,
respectively. Other molecular species are colored in white. One can see that the number of reaction products were continuously increasing while reactants
were being consumed. bTime dependences of the numbers of main molecular species in real-time MD simulation. These curves are smoothed to make
them look better and clearer.
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-19497-z ARTICLE
NATURE COMMUNICATIONS | (2020) 11:5713 | https://doi.org/10.1038 /s41467-020-19497-z | www.nature.com/naturecommunications 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved
c
b
R1
R2
R3
R6
R5
R4
H
HH
C
HHH
OO
C
H
H
H
HH
O
OH
C
C
H
HH
H
C
HH
H
H
H
H
OOH
C
C
HO
HH
C
R8
R9
R10, R11
R12
R7
R1: CH4 + .OH
CH4 + .H
CH4 + O2
.CH3 + O2
.CH3 + HOO.
.CH3 + .OH
.CH3 + .OH
CH3OO. + .H
CH3OOH + .H
CH3OH + .H
CH3OH + .OH
CH3OH + .H
.CH3 + H2O
.CH3 + H2
.CH3 + HOO.
CH3OO.
CH3 OOH
CH3 OH
:CH2 + H2O
CH3O. + .OH
CH3O. + H2O
CH3O. + H2
CH3O. + H2O
.CH2OH + H2
R2:
R3:
R4:
R5:
R6:
R7:
R8:
R9:
R10:
R11:
R12:
a
2.47
1.89
1.36 1.60 2.14
2.50
32.21 ps 32.22 ps 32.23 ps 32.24 ps 32.25 ps
3.61 2.23 1.89 1.48 1.29
104.90 ps 104.95 ps 104.96 ps 104.97 ps 104.98 ps
Fig. 2 The initial stage of combustion. a Main reaction pathways in the initial stage of the combustion. bA real-time trajectory showing the reaction
process of hydrogen abstraction from methane by O
2
. Atoms in cyan, red and gray colors are carbon, oxygen and hydrogen, respectively. cA real-time
trajectory showing the reaction process leading to the creation of methanol. Denition of colored atoms is the same as in (b).
R13:
R14:
R15:
R16:
R17:
R18:
R19:
R20:
R21:
R22:
R23:
R13
R20
R21
R22
R23
R14, R15
R16
R17
R18
R19
229.88 ps 229.89 ps 229.90 ps 229.91 ps 229.93 ps
2.58 1.56 1.86
1.34 0.98
2.93
342.14 ps 342.16 ps 342.17 ps 342.18 ps 342.20 ps
4.30 2.37
1.44
1.09
1.24
1.16
2.38
c
b
a.CH2OH :CHOH + .H
CH2O + .O.CHO + .OH
CH2O + .OH .CHO + H2O
.CHO + .H
:CHOH
.CH2OH + O2CH2O + HOO.
.CH2OH CH2 O + .H
:CH2 + .OH CH2 O + .H
CH3O. + .OH CH2O + H2O
CH3O. + .HCH2O + H2
.CHO + .H
:CH + H2O
.
:CH2 + .OH :CH + H2O
.
C
C
C
C
CH
H
H
HH
HH
H
H
C
C
H
H
H
OH
O
O
OH
O
Fig. 3 Production of formaldehyde and formyl radicals. a The main reaction pathways for the formation of formaldehyde and formyl radicals. bThe real-
time trajectory of the reaction CH
3
+·OH CH
2
O+H
2
O. Atoms in cyan, red and gray colors are carbon, oxygen and hydrogen, respectively. cThe real-
time trajectory of the reaction CH
2
+·OH ·CHO +H
2
O. Denition of colored atoms is the same as in (b).
ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-19497-z
4NATURE COMMUNICATIONS | (2020) 11:5713 | https://doi.org/10.1038 /s41467-020-19497-z | www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
COOH was formed. The COOH should be relatively inactive, it
stably existed for about 0.1 ps, and nally lost an H atom and
became CO
2
.
Further analysis found that the above-mentioned 32 reactions
have all been found by experiments, and the reaction networks
constructed by them are also highly consistent with the main
reaction networks found experimentally48,49. We totally detected
505 molecular species and 798 reactions from the trajectory.
Species such as ethane, ethylene, and acetylene can also be found
in the experimental database. In all, 130 of the 798 reactions
extracted from the MD trajectory were included in the widely
accepted GRI_Mech experimental mechanism library48. Some
experimentally observed reactions were not observed in our
simulation, mostly likely because the present simulation was
performed at relatively high temperature.
In fact, discovering new reactions is an important advantage of
the present approach. For methane oxidation, a system that has
been extensively studied by experiments, NN-based reactive MD
can still discover hundreds of chemical reactions that have not
been experimentally reported. This demonstrates that reactive
MD can be a powerful tool to study combustion reactions.
Interestingly, we found a cyclopropene molecule in the trajectory,
which has not been reported to our knowledge. As shown in
Supplementary Fig. 5, at 634.09 ps, a CO molecule collided with a
·CH
3
radical and joined together. Then a CH
2
CO molecule was
formed through hydrogen loss. The CH
2
CO was stable for about
200 ps and then combined with another ·CH
3
radical. Subsequent
hydrogen loss led to the formation of a cycloprop-2-en-1-one
molecule at 828.65 ps. After another 60 ps, the third ·CH
3
attacked the cycloprop-2-en-1-one molecule and kicked out the
CO group to form the CH
3
CCH
2
molecule at 889.50 ps. Through
further internal reaction and hydrogen loss, it nally formed a
cyclopropene molecule at 891.16 ps and remained stable through-
out the rest of the simulation. The entire process took about
260 ps to complete. While it might be possible that nding
cyclopropene in our simulation is a coincidence or driven by
the relatively high temperature, it still illustrates the ability of
reactive MD simulation to discover new molecules and new
reactions.
Discussion
Accurate in silico MD simulation of combustion or other com-
plex chemical reactions is one of the ultimate goals of compu-
tational chemistry. In this work, an articial neural network
potential model trained to ab initio data describes complex che-
mical reactions in methane combustion. This NN potential model
is orders of magnitude faster than the conventional DFT calcu-
lation. Benet from the high efciency of the NN model and GPU
acceleration, nanosecond-sale MD simulations for a chemical
system containing 900 atoms was achieved in about 4 days or so
on an NVIDIA Tesla P100 card. Detailed reaction mechanisms
were extracted from the MD trajectory and the detected mole-
cular species and reaction networks are in excellent agreement
with experimental observation. In addition, many new reactions
were found that were not included in the experimental database.
Compared to laboratory experiments, in silico simulations can be
performed under more extreme conditions, and any specic
reaction of interest can be easily detected and tracked. In addi-
tion, MD simulation can achieve ultra-high time resolution. The
time-step used in this work is 0.1 fs. With the improvement of
algorithms and hardware, even resolutions in smaller time scale
can be achieved.
Compared with the traditional prior knowledge-based theore-
tical approach, reactive MD simulation can explore complex
reaction networks and discover new reactions and species without
any prior knowledge of reactions. Actually, complex reactions
cannot be well understood without considering the kinetics of the
reaction network it belongs to. Since reactive MD simulation
tracks all chemical reactions in real time, one can even deduce the
rate constants for individual reactions from a single MD trajec-
tory by statistical analysis. We extracted the ten most statistically
signicant reactions from the trajectory and calculated their rate
constants based on the algorithms developed in previous
studies50,51. As shown in Supplementary Table 2, most of the rate
constants agree well with the GRI_Mech data48. The main source
of error might come from the uncertainties of parameters in the
Arrhenius formula and the completeness of sampling. Ideally, one
should run many trajectories with different initial conditions to
obtain truly statistically accurate results. However, although these
R24:
R25:
R26:
R27:
R28:
R29:
R30:
R31:
R32:
R24
R25
R26
R27
R28 R29
R30
R31
R32
405.78 ps 405.79 ps 405.83 ps 405.84 ps 405.85 ps
2.08 1.98 2.44
815.32 ps 815.35 ps 815.38 ps 815.48 ps 815.49 ps
2.43 1.74 1.40 2.03
c
b
a
OCH
OOO
O
O
CC
COO C
O
O
C
H
H
H
H
.CHO + .OH HCOOH
HCOOH + .OH .COOH + H2O
.CHO CO + .H
.CHO + .OH CO + H2O
.COOH + .HCO + H2O
.COOH + .OH CO2 + H2O
HCOO . + .HCO2 + H2
CO + . OH CO2 + .H
HCOOH + .OH HCOO . + H2O
Fig. 4 Production of CO and CO
2
.aMain reaction pathways for the formation of CO and CO
2
.bThe real-time trajectory of the reaction ·CHO CO +·H.
Atoms in cyan, red and gray colors are carbon, oxygen and hydrogen, respectively. cThe real-time trajectory of the reaction CO +·OH CO
2
+·H.
Denition of colored atoms is the same as in (b).
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-19497-z ARTICLE
NATURE COMMUNICATIONS | (2020) 11:5713 | https://doi.org/10.1038 /s41467-020-19497-z | www.nature.com/naturecommunications 5
Content courtesy of Springer Nature, terms of use apply. Rights reserved
rates may not be accurate enough to be used directly in kinetic
modeling, they can be effective in contributing to a comprehen-
sive understanding of the combustion reaction.
A practical issue to be pointed out is that although some
algorithms were used in this study to minimize the size of the
reference dataset, there are still 578,731 structures in the reference
set. Although the DFT calculation is very efcient, such a large
reference set is difcult to perform high-level post-HartreeFock
calculations. In order to further minimize the size of the reference
set while ensuring its completeness, new algorithms need to be
developed to further enhance the efciency of this approach.
Recently, Zhang et al. developed the DP-GEN52 (Deep potential
Generator) software platform, which can automatically construct
the reference dataset and train the NN model. The concurrent
learning algorithm employed by this platform can make the
redundancy of the reference set as small as possible. We are trying
to integrate the algorithms developed in this work into the DP-
GEN platform.
In addition, it is worth to point that while combustion is
usually thought to be dominated by free radical reactions,
recent studies have begun to examine the role of electronically
excited state species in combustion. For example, the additional
introduction of plasma was found to be effective in promoting
combustion in experiments53. However, MD simulations
involving excited states are highly nontrivial, and there are large
uncertainties in ab initio quantum chemistry computation
for treating excited states of large systems. Based on sophisti-
cated empirical or machine-learning PESs, several recent works
have achieved the excited-state MD simulation for model sys-
tems5462.Forexample,theO+O recombination reaction to
form the ground and excited-state singlet O
2
molecules on
amorphous solid water60. Such strategy will be considered in
our future studies.
Despite further improvement is needed, the current report
heralds the dawn of a new era in which neural network-based
reactive MD simulation can be practically applied to simulating
complex reaction systems at the ab initio level, which provides
atomic-level understanding of every reaction process at unpre-
cedented level of details beyond what laboratory experiment can
accomplish.
Methods
Reference dataset. In this study, a workow was developed for making
reference datasets (Fig. 5). The details of each module in the workow are given
below.
To increase the efciency of dataset construction, reactive MD simulation with
ReaxFF was used to sample an initial dataset. A model combustion system
containing a lot of CH
4
and H
2
molecules was built by using the Amorphous Cell
module in the Material Studio63 software package. Then the LAMMPS64 program
was used to perform the MD simulation. The NVT ensemble was used and the
temperature was set to 3000 K with the Berend sen thermostat. The ReaxFF
parameter of Chenoweth et al. (CHO-2008 parameter set)65 was employed. The
Open Babel software66 and the Depth-First Search algorithm67 were used to detect
species in every snapshot of the trajectory. Then, for each atom in each snapshot,
we build a molecular cluster that contains this atom and species that within a
specied cutoff centered on it. In this work, the cutoff was set to 5 Å.
The initial dataset contains about 22.5 million structures, which is too large to
perform QM calculations for every molecular cluster it contains. Therefore, it is
necessary to resample it to remove redundant structures while ensuring its
completeness. To this end, we rst classied the initial dataset into sub-datasets
based on the chemical bond information of the central atom. For example, the
central H atom can be classied into two different types: a single H atom (H0) and
an H atom formed a single chemical bond with another atom (H1).
Further treatment is still needed for large sub-datasets. For a given large sub-
dataset, we rst expressed each molecular cluster it contains as a Coulomb
matrix68:
Cij ¼
1
2Z2:4
i;i¼j
ZiZj
RiRj
jj
;ij;
8
<
:
ð1Þ
where Ziand Zjare nuclear charges of atom iand j,Riand Rjare their Cartesian
coordinates. The minimum image convention69 was used to consider the
periodic boundary condition. Invisible atomswere introduced to x the
dimension of the Coulomb matrix. These invisible atoms do not inuence the
physics of the molecule of interest and make the total number of atoms in the
molecule sum to a constant. To lower the dimension of the dataset and keep as
much structural information as possible, the Coulomb matrix was further
represented by the eigen-spectrum, which is obtained by solving the eigenvalue
problem Cv ¼λvunder the constraint λiλiþ1. The clustering algorithm Mini
Batch KMeans70 was then used to cluster the given sub-datasets into smaller
clusters according to the eigen-spectrum. Then we randomly selected
10,000 structures from each cluster (If the cluster contains no more than
10,000 structures, then all of them were selected).
Large amplitude collisions and reactions in the combustion can produce a lot of
unpredictable species and intermediates. To ensure the completeness of the
reference dataset, an active learning approach71 was used. Four different NN PES
models were trained based on the dataset from the last step. Then several short MD
simulations were performed based on these NN models. During the simulation, the
atomic forces are evaluated by these four NN PES models simultaneously. For a
Molecular system
Sampling with
ReaxFF
Redundancy
removal
(Coulomb matrixes and
Mini Batch Kmeans)
QM calculation NN training
Re-sampling
(with NN-based short MDs)
Dataset
New data?
Final dataset
YES
NO
Active learning
Fig. 5 The workow of reference dataset construction. The process and steps used in this study to generate the reference dataset needed for neural
network training to generate the potential energy for MD simulation.
ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-19497-z
6NATURE COMMUNICATIONS | (2020) 11:5713 | https://doi.org/10.1038 /s41467-020-19497-z | www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
specic atom, if the predicted forces by these four models are consistent with each
other, then the molecular cluster that centered on this atom should be found in the
dataset. On the contrary, if the results of these four models are inconsistent with
each other and the error between them is in a specic range (0.5 eV/Å < error < 1.0
eV/Å in this work), the corresponding molecular cluster will be added into the
dataset. The update of the dataset will be continued until the predictions of the four
models are always consistent.
QM calculation. The potential energy and atomic forces for every structure in the
nal dataset were calculated by Gaussian 1672 software at the MN15/6-31G** level.
The MN15 functional was employed because it has broad accuracy for multi-
reference and single-reference systems73. To consider the spin polarization effect,
the initial wave function of a given structure is obtained by the combination of the
wave functions of individual molecular species forming the structure, while the
wave function of each molecular species was calculated based on its own charge
and spin.
Training of the NN PES. The scheme of the NN model is shown in Fig. 6. The total
energy Eof a given structure is decomposed into a sum of atomic energy
contributions19,74, i.e., E¼PiEi, where iis the index of the atom. Each atomic
energy is fully determined by the position of the ith atom and its near neighbors.
To guarantee the translational, rotational, and permutational symmetries lying in
the PES, the Cartesian coordinates of atomics are mapped to specic mathematical
formulas called descriptorsof the atomic chemical environment.
The DeepPot-SE (Deep Potential-Smooth Edition) model47 was used to train
the NN potential by the DeePMD-kit program74. Details of this method can be
found in ref. 67. The model includes two networks: the embedding network and the
tting network. Both networks use the ResNet architecture75. The size of the
embedding network was set to (25, 50, 100) and the size of the embedding matrix
was set to 12. The size of the tting network is set to (240, 240, 240). The cutoff
radius was set to 6.0 Å and the descriptors decay smoothly from 1.0 to 6.0 Å. The
initial learning rate was set to 0.0005 and it will decay every 20,000 steps. The loss is
dened by
pe
NΔE2þpf
3NX
i
jΔFij2
;ð2Þ
where ΔEand ΔFiare root mean square errors in energy and force. The prefactor
peis set to 0.2 eV2and the pfdecays from 1000 Å2eV2to 1 Å2eV2.
Data availability
The datasets (structures, potential energies and atomic forces of molecular species)
generated during the current study are available at https://github.com/tongzhugroup/
NNREAX,https://doi.org/10.6084/m9.gshare.12973055. Source data are provided with
this paper.
Code availability
The codes used to generate the datasets in the current study are available at https://
github.com/tongzhugroup/mddatasetbuilder,https://doi.org/10.5281/zenodo.4035925.
Received: 10 March 2020; Accepted: 6 October 2020;
References
1. Martinez, T. J. Ab initio reactive computer aided molecular design. Acc. Chem.
Res. 50, 652656 (2017).
2. Car, R. & Parrinello, M. Unied approach for molecular-dynamics and
density-functional theory. Phys. Rev. Lett. 55, 24712474 (1985).
3. Tuckerman, M. E. Ab initiomolecular dynamics: basic concepts, current trends
and novel applications. J. Phys. Condens. Matter 14, R1297R1355 (2002).
4. Wang, L.-P. et al. Discovering chemistry with an ab initio nanoreactor. Nat.
Chem. 6, 1044 (2014).
5. Van Duin, A. C., Dasgupta, S., Lorant, F. & Goddard, W. A. ReaxFF: a reactive
force eld for hydrocarbons. J. Phys. Chem. A 105, 93969409 (2001).
6. Brenner, D. W. et al. A second-generation reactive empirical bond order
(REBO) potential energy expression for hydrocarbons. J. Phys. Condens.
Matter 14, 783 (2002).
7. Nouranian, S., Tschopp, M. A., Gwaltney, S. R., Baskes, M. I. & Horstemeyer,
M. F. An interatomic potential for saturated hydrocarbons based on the
modied embedded-atom method. Phys. Chem. Chem. Phys. 16, 62336249
(2014).
8. Qu, C., Yu, Q. & Bowman, J. M. Permutationally invariant potential energy
surfaces. Annu. Rev. Phys. Chem. 69, 151175 (2018).
9. Li, J. & Guo, H. Permutationally invariant tting of intermolecular potential
energy surfaces: a case study of the Ne-C2H2 system. J. Chem. Phys. 143,
214304 (2015).
10. Braams, B. J. & Bowman, J. M. Permutationally invariant potential energy
surfaces in high dimensionality. Int. Rev. Phys. Chem. 28, 577606 (2009).
11. Nagy, T., Yosa Reyes, J. & Meuwly, M. Multisurface adiabatic reactive
molecular dynamics. J. Chem. Theory Comput. 10, 13661375 (2014).
12. Warshel, A. & Florián, J. in Encyclopedia of Computational Chemistry (John
Wiley and Sons, 2002).
13. Meuwly, M. Reactive molecular dynamics: from small molecules to proteins.
Wires Comput. Mol. Sci. 9, e1386 (2019).
14. Koner, D., Salehi, S. M., Mondal, P. & Meuwly, M. Non-conventional force
elds for applications in spectroscopy and chemical reaction dynamics. J.
Chem. Phys. 153, 010901 (2020).
15. Zheng, M. et al. Pyrolysis of liulin coal simulated by GPU-based ReaxFF MD
with cheminformatics analysis. Energy Fuels 28, 522534 (2014).
16. Wang, E., Ding, J., Qu, Z. & Han, K. Development of a reactive force eld for
hydrocarbons and application to iso-octane thermal decomposition. Energy
Fuels 32, 901907 (2017).
17. Cheng, T., Jaramillo-Botero, A., Goddard, W. A. & Sun, H. Adaptive
accelerated ReaxFF reactive dynamics with validation from simulating
hydrogen combustion. J. Am. Chem. Soc. 136, 94349442 (2014).
18. Bertels, L. W., Newcomb, L. B., Alaghemandi, M., Green, J. R. & Head-
Gordon, M. Benchmarking the performance of the ReaxFF reactive force eld
on hydrogen combustion systems. J. Phys. Chem. A 124, 56315645 (2020).
19. Behler, J. & Parrinello, M. Generalized neural-network representation of high-
dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).
20. Behler, J. First principles neural network potentials for reactive simulations of
large molecular and condensed systems. Angew. Chem. Int. 56, 1282812840
(2017).
21. Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network
potential with DFT accuracy at force eld computational cost. Chem. Sci. 8,
31923203 (2017).
R1
R2
R3
Ri
D1
D2
D3
Di
E1
E2
E3
Ei
E
Descriptors Atomic
NNs
Atomic
energies
Fig. 6 The neural network model. The neural network model that generates the potential energy surface for MD simulation.
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-19497-z ARTICLE
NATURE COMMUNICATIONS | (2020) 11:5713 | https://doi.org/10.1038 /s41467-020-19497-z | www.nature.com/naturecommunications 7
Content courtesy of Springer Nature, terms of use apply. Rights reserved
22. Yao, K., Herr, J. E., Toth, D. W., Mckintyre, R. & Parkhill, J. The TensorMol-
0.1 model chemistry: a neural network augmented with long-range physics.
Chem. Sci. 9, 22612269 (2018).
23. Lee, K., Yoo, D., Jeong, W. & Han, S. SIMPLE-NN: an efcient package for
training and executing neural-network interatomic potentials. Comput. Phys.
Commun. 242,95103 (2019).
24. Chen, X., Jørgensen, M. S., Li, J. & Hammer, B. Atomic energies from a
convolutional neural network. J. Chem. Theory Comput. 14, 39333942
(2018).
25. Zhang, Y., Hu, C. & Jiang, B. Embedded atom neural network potentials:
efcient and accurate machine learning with a physically inspired
representation. J. Phys. Chem. Lett. 10, 49624967 (2019).
26. Chmiela, S. et al. Machine learning of accurate energy-conserving molecular
force elds. Sci. Adv. 3, e1603015 (2017).
27. Schutt, K. T., Arbabzadah, F., Chmiela, S., Muller, K. R. & Tkatchenko, A.
Quantum-chemical insights from deep tensor neural networks. Nat. Commun.
8, 13890 (2017).
28. Sauceda, H. E., Chmiela, S., Poltavsky, I., Muller, K. R. & Tkatchenko, A.
Molecular force elds with gradient-domain machine learning: construction
and application to dynamics of small molecules with coupled cluster forces. J.
Chem. Phys. 150, 114102 (2019).
29. Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-
R. SchNeta deep learning architecture for molecules and materials. J. Chem.
Phys. 148, 241722 (2018).
30. Unke, O. T. & Meuwly, M. PhysNet: a neural network for predicting energies,
forces, dipole moments, and partial charges. J. Chem. Theory Comput. 15,
36783693 (2019).
31. Christensen, A. S., Bratholm, L. A., Faber, F. A. & Anatole von Lilienfeld, O.
FCHL revisited: faster and more accurate quantum machine learning. J. Chem.
Phys. 152, 044107 (2020).
32. Lu, X., Meng, Q., Wang, X., Fu, B. & Zhang, D. H. Rate coefcients of the H+
H2O2H2+HO
2
reaction on an accurate fundamental invariant-neural
network potential energy surface. J. Chem. Phys. 149, 174303 (2018).
33. Yin, Z., Guan, Y., Fu, B. & Zhang, D. H. Two-state diabatic potential energy
surfaces of ClH 2 based on nonadiabatic couplings with neural networks. Phys.
Chem. Chem. Phys. 21, 2037220383 (2019).
34. Zhang, Y., Zhou, X. & Jiang, B. Bridging the gap between direct dynamics and
globally accurate reactive potential energy surfaces using neural networks. J.
Phys. Chem. Lett. 10, 11851191 (2019).
35. Chen, J., Xu, X., Xu, X. & Zhang, D. H. Communication: An accurate global
potential energy surface for the OH plus CO -> H +CO
2
reaction using
neural networks. J. Chem. Phys. 138, 221104 (2013).
36. Huang, S. D., Shang, C., Kang, P. L., Zhang, X. J. & Liu, Z. P. LASP: fast global
potential energy surface exploration. Wiley Interdisci. Rev. Comput. Mol 9,
e1415 (2019).
37. Kang, P. L., Shang, C. & Liuo, Z. P. Glucose to 5-hydroxymethylfurfural:
origin of site-selectivity resolved by machine learning based reaction sampling.
J. Am. Chem. Soc. 141, 2052520536 (2019).
38. Brickel, S., Das, A. K., Unke, O. T., Turan, H. T. & Meuwly, M. Reactive
molecular dynamics for the [ClCH3Br]reaction in the gas phase and in
solution: a comparative study using empirical and neural network force elds.
Electron. Struct. 1, 024002 (2019).
39. Zhang, L., Han, J., Wang, H., Car, R. & Weinan, E. Deep potential molecular
dynamics: a scalable model with the accuracy of quantum mechanics. Phys.
Rev. Lett. 120, 143001 (2018).
40. Han, J. Q., Zhang, L. F., Car, R. & Weinan, E. Deep potential: a general
representation of a many-body potential energy surface. Commun. Comput.
Phys. 23, 629639 (2018).
41. Jia, W. et al. Pushing the limit of molecular dynamics with ab initio accuracy
to 100 million atoms with machine learning. Preprint at https://arxiv.org/abs/
2005.00223 (2020).
42. Blum, L. C. & Reymond, J.-L. 970 million druglike small molecules for virtual
screening in the chemical universe database GDB-13. J. Am. Chem. Soc. 131,
87328733 (2009).
43. Ruddigkeit, L., Van Deursen, R., Blum, L. C. & Reymond, J.-L. Enumeration of
166 billion organic small molecules in the chemical universe database GDB-
17. J. Chem. Inf. Model. 52, 28642875 (2012).
44. Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1, a data set of 20 million
calculated off-equilibrium conformations for organic molecules. Sci. Data 4,
170193 (2017).
45. Smith, J. S. et al. The ANI-1ccx and ANI-1x data sets, coupled-cluster and
density functional theory properties for molecules. Sci. Data 7, 134 (2020).
46. He, Z., Li, X.-B., Liu, L.-M. & Zhu, W. The intrinsic mechanism of methane
oxidation under explosion condition: a combined ReaxFF and DFT study. Fuel
124,8590 (2014).
47. Zhang, L. et al. End-to-end symmetry preserving inter-atomic potential energy
model for nite and extended systems. In: Bengio, S. et al. (eds) Advances in
Neural Information Processing Systems 31, 44364446 (Curran Associates Inc,
2018).
48. Smithy, G. P. et al. GRI_Mech 30. http://combustion.berkeley.edu/gri-mech/
(1999).
49. Reid, I. A. B., Robinson, C. & Smith, D. B. Spontaneous ignition of methane:
Measurement and chemical model. Symp. Int. Combust. Proc. 20, 18331843
(1985).
50. Wu, Y. Z., Sun, H., Wu, L. & Deetz, J. D. Extracting the mechanisms and
kinetic models of complex reactions from atomistic simulation data. J.
Comput. Chem. 40, 15861592 (2019).
51. Dontgen, M. et al. Automated discovery of reaction pathways, rate constants,
and transition states using reactive molecular dynamics simulations. J. Chem.
Theory Comput. 11, 25172524 (2015).
52. Zhang, Y. et al. DP-GEN: a concurrent learning platform for the generation of
reliable deep learning based potential energy models. Comput. Phys. Commun.
253, 107206 (2020).
53. Ju, Y. & Sun, W. Plasma assisted combustion: dynamics and chemistry. Prog.
Energy Combust. Sci. 48,2183 (2015).
54. Chen, W.-K., Liu, X.-Y., Fang, W.-H., Dral, P. O. & Cui, G. Deep learning
for nonadiabatic excited-state dynamics. J. Phys. Chem. Lett. 9, 67026708
(2018).
55. Hu, D., Xie, Y., Li, X., Li, L. & Lan, Z. Inclusion of machine learning kernel
ridge regression potential energy surfaces in on-the-y nonadiabatic
molecular dynamics simulation. J. Phys. Chem. Lett. 9, 27252732 (2018).
56. Westermayr, J. et al. Machine learning enables long time scale molecular
photodynamics simulations. Chem. Sci. 10, 81008107 (2019).
57. Westermayr, J., Faber, F. A., Christensen, A. S., von Lilienfeld, O. A. &
Marquetand, P. Neural networks and kernel ridge regression for excited states
dynamics of CH2NH2+: from single-state to multi-state representations and
multi-property machine learning models. Mach. Learn.: Sci. Technol. 1,
025009 (2020).
58. Borges, Y. G., Galvão, B. R. L., Mota, V. C. & Varandas, A. J. C. A trajectory
surface hopping study of N2A3Σu+quenching by H atoms. Chem. Phys. Lett.
729,6164 (2019).
59. Schinke, R., Grebenshchikov, S. Y., Ivanov, M. V. & Fleurat-Lessard, P.
Dynamical studies of the ozone isotope effect: a status report. Annu. Rev. Phys.
Chem. 57, 625661 (2006).
60. Pezzella, M., Koner, D. & Meuwly, M. Formation and stabilization of ground
and excited-state singlet O2 upon recombination of (3)P oxygen on
amorphous solid water. J. Phys. Chem. Lett. 11, 21712176 (2020).
61. Koner, D., Bemish, R. J. & Meuwly, M. The C((3)P) +NO(X(2)Pi)> O((3)P)
+CN(X(2)Sigma(+)), N((2)D)/N((4)S) +CO(X(1)Sigma(+)) reaction: rates,
branching ratios, and nal states from 15 K to 20 000 K. J. Chem. Phys. 149,
094305 (2018).
62. Koner, D., Unke, O. T., Boe, K., Bemish, R. J. & Meuwly, M. Exhaustive state-
to-state cross sections for reactive molecular collisions from importance
sampling simulation and a neural network representation. J. Chem. Phys. 150,
211101 (2019).
63. BOVIA, Materials Studio 2017 https://www.3ds.com/products-services/biovia/
resource-center/citations-and-references/ (Dassault Systèmes, San Diego,
2017).
64. Aktulga, H. M., Fogarty, J. C., Pandit, S. A. & Grama, A. Y. Parallel reactive
molecular dynamics: numerical methods and algorithmic techniques. Parallel
Comput. 38, 245259 (2012).
65. Chenoweth, K., Van Duin, A. C. & Goddard, W. A. ReaxFF reactive force eld
for molecular dynamics simulations of hydrocarbon oxidation. J. Phys. Chem.
A112, 10401053 (2008).
66. OBoyle, N. M. et al. Open Babel: an open chemical toolbox. J.
Cheminformatics 3, 33 (2011).
67. Tarjan, R. Depth-rst search and linear graph algorithms. SIAM J. Comput. 1,
146160 (1972).
68. Rupp, M., Tkatchenko, A., Müller, K.-R. & Von Lilienfeld, O. A. Fast and
accurate modeling of molecular atomization energies with machine learning.
Phys. Rev. Lett. 108, 058301 (2012).
69. Hloucha, M. & Deiters, U. Fast coding of the minimum image convention.
MoSim 20, 239244 (1998).
70. Sculley, D. Web-scale k-means clustering. In: Rappa, M. et al. (eds) Proc. 19th
International Conference on World Wide Web (ACM, 2010).
71. Zhang, L., Lin, D.-Y., Wang, H., Car, R. & Weinan, E. Active learning of
uniformly accurate interatomic potentials for materials simulation. Phys. Rev.
Mat. 3, 023804 (2019).
72. Frisch, M. et al. Gaussian 16, revision A. 03 (Gaussian Inc, Wallingford CT,
2016).
73. Haoyu, S. Y., He, X., Li, S. L. & Truhlar, D. G. MN15: A KohnSham global-
hybrid exchangecorrelation density functional with broad accuracy for multi-
reference and single-reference systems and noncovalent interactions. Chem.
Sci. 7, 50325051 (2016).
ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-19497-z
8NATURE COMMUNICATIONS | (2020) 11:5713 | https://doi.org/10.1038 /s41467-020-19497-z | www.nature.com/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
74. Wang, H., Zhang, L., Han, J. & Weinan, E. DeePMD-kit: a deep learning
package for many-body potential energy representation and molecular
dynamics. Comput. Phys. Commun. 228, 178184 (2018).
75. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image
recognition. In: Tuytelaars, T. et al. (eds) Proc. IEEE Conference on Computer
Vision and Pattern Recognition (IEEE, 2016).
Acknowledgements
The authors thank Dr. Linfeng Zhang and Dr. Han Wang for their discussion and help in
using DeepPot-SE and DeePMD-kit. T.Z. would also like to thank Prof. Donghui Zhang
for his valuable suggestions in this project. This work was supported by the National Key
R&D Program of China (grant no. 2016YFA0501700), the National Natural Science
Foundation of China (grant nos. 91641116, 91753103, and 21933010), and the Innova-
tion Program of Shanghai Municipal Education Commission (201701070005E00020).
J. Zeng was partially supported by the National Innovation and Entrepreneurship
Training Program for Undergraduate (201910269080). We also thank the ECNU
Multifunctional Platform for Innovation (No. 001) for providing supercomputer time.
Author contributions
J.Z. trained the neural network potential and performed most of the QM calculations.
L.C. and M.X. analyzed the trajectory and performed part of the QM calculation. T.Z.
and J.Z.H.Z. conceived the project and wrote the manuscript with input from all authors.
Competing interests
The authors declare no competing interests.
Additional information
Supplementary information is available for this paper at https://doi.org/10.1038/s41467-
020-19497-z.
Correspondence and requests for materials should be addressed to T.Z. or J.Z.H.Z.
Peer review information Nature Communications thanks the anonymous reviewers for
their contribution to the peer review of this work. Peer reviewer reports are available.
Reprints and permission information is available at http://www.nature.com/reprints
Publishers note Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional afliations.
Open Access This article is licensed under a Creative Commons
Attribution 4.0 International License, which permits use, sharing,
adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative
Commons license, and indicate if changes were made. The images or other third party
material in this article are included in the articles Creative Commons license, unless
indicated otherwise in a credit line to the material. If material is not included in the
articles Creative Commons license and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from
the copyright holder. To view a copy of this license, visit http://creativecommons.org/
licenses/by/4.0/.
© The Author(s) 2020
NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-19497-z ARTICLE
NATURE COMMUNICATIONS | (2020) 11:5713 | https://doi.org/10.1038 /s41467-020-19497-z | www.nature.com/naturecommunications 9
Content courtesy of Springer Nature, terms of use apply. Rights reserved
1.
2.
3.
4.
5.
6.
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
apply.
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
not:
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
control;
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
writing;
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
content.
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at
onlineservice@springernature.com
... Shortcomings of optimization-based training are brought to particularly strong relief by the problem of overfitting, where naive optimization produces spurious outcomes. [5][6][7] The broad success of neural networks for modelling physical processes [8][9][10][11][12] has prompted advances that are based on inverting the direction of investigation and treating neural networks as if they were physical systems in their own right. [13][14][15][16] These successes raise the question of whether broader, physical perspectives could motivate the construction of improved training algorithms. ...
Preprint
The broad range of neural network training techniques that invoke optimization but rely on ad hoc modification for validity suggests that optimization-based training is misguided. Shortcomings of optimization-based training are brought to particularly strong relief by the problem of overfitting, where naive optimization produces spurious outcomes. The broad success of neural networks for modelling physical processes has prompted advances that are based on inverting the direction of investigation and treating neural networks as if they were physical systems in their own right These successes raise the question of whether broader, physical perspectives could motivate the construction of improved training algorithms. Here, we introduce simmering, a physics-based method that trains neural networks to generate weights and biases that are merely ``good enough'', but which, paradoxically, outperforms leading optimization-based approaches. Using classification and regression examples we show that simmering corrects neural networks that are overfit by Adam, and show that simmering avoids overfitting if deployed from the outset. Our results question optimization as a paradigm for neural network training, and leverage information-geometric arguments to point to the existence of classes of sufficient training algorithms that do not take optimization as their starting point.
... The second group consists of models based on the fundamental principles of the structure of matter. An example of this is neural network-based molecular dynamics simulation [41]. These models use several databases and describe a chain of reactions consisting of several hundred elementary reactions. ...
Article
Full-text available
Aviation in Europe is required to use fuels containing up to 2 wt. % of sustainable aviation fuels (SAFs). A better understanding of the impact of SAFs on the combustion process will be helpful in solving problems that may arise from the widespread use of these kinds of fuels. It was assumed that the reactivity coefficient αi and the activation energy could be a criteria for assessing the impact of SAFs on the combustion process. Based on DGEN engine tests, the following activation energy values of CO2 and CO formation reactions were obtained—Jet A-1: EaCO2/R=3480 and EaCO/R=982; A30: EaCO2/R=3705 and EaCO/R=2903; and H30: EaCO2/R=3637 and EaCO/R=2843. These results indicate differences in the structure of combustion reaction chains involved by the SAF addition to Jet A-1 fuel. The same conclusion has been formulated on the basis of the reactivity coefficient αi. The values of maximum cylinder pressure (Pmax) obtained during indicator RCCM (rapid compression combustion machine) tests correlated with both the activation energy and coefficients of reactivity. This suggests that the influence of SAF addition to Jet A-1 fuel on the structure of chemical reactions chain during RCCM tests is similar to the influence during DGEN 380 tests. The assumption stated above was confirmed. This indicates the possibility of the preliminary forecasting of CO2 and CO emissions from the DGEN 380 engine based on the test at the RCCM stand.
Preprint
Machine learning potentials have become increasingly successful in atomistic simulations. Many of these potentials are based on an atomistic representation in a local environment, but an efficient description of non-local interactions that exceed a common local environment remains a challenge. Herein, we propose a simple and efficient equivariant model, EquiREANN, to effectively represent non-local potential energy surface. It relies on a physically inspired message passing framework, where the fundamental descriptors are linear combination of atomic orbitals, while both invariant orbital coefficients and the equivariant orbital functions are iteratively updated. We demonstrate that this EquiREANN model is able to describe the subtle potential energy variation due to the non-local structural change with high accuracy and little extra computational cost than an invariant message passing model. Our work offers a generalized approach to create equivariant message passing adaptations of other advanced local many-body descriptors.
Article
Atmospheric pressure chemical ionization (APCI) is often used in the analysis of linear saturated hydrocarbons (LSHs) as this ionization technique commonly produces [M − H] ⁺ ions in high abundance. However, APCI (along with other atmospheric pressure sources) is often impacted by in‐source oxidation, leading to a variety of ionic products. Identifying these products and understanding their mechanisms of formation is crucial for characterizing complex mixtures with substantial hydrocarbon content, such as those found in the petrochemical industry. In this study, in‐source oxidation of LSHs was observed in gas chromatography (GC) coupled to high‐resolution mass spectrometry (HRMS) via a custom‐built APCI interface. Studies showed that the abundance of these oxidized ions correlated positively with atmospheric water, yet occurred without the inclusion of water‐based oxygen as judged by experiments with stable isotope‐labeled water. The oxidation of LSHs was further influenced by the reactive species in the ionization atmosphere. Fragmentation data using stable isotope‐labeled LSH standards unveiled multiple structurally unique ions with one or more oxidation sites on both primary and secondary carbons. These ionic products bear resemblance to combustion byproducts, suggesting the instrumental configuration fosters plasma‐assisted combustion‐like processes that encourage the radical‐mediated oxidation of LSHs rather than generate [M − H] ⁺ . Through these investigative efforts, a mechanism analogous to combustion was proposed for the formation of LSH oxidation products in GC‐APCI‐HRMS. Data demonstrate that these ions are robustly generated in petrochemical products, allowing for proper characterization of these complex mixtures.
Article
The nanoscale form of the typical insensitive energetic material 1,3,5-triamino-2,4,6-trinitrobenzene (TATB) exhibits the capability to improve the energy performances while maintaining low sensitivity compared to raw TATB. Investigating the particle size effect on the intrinsic pyrolysis mechanisms facilitates the selection of the TATB particle size in applications to ensure efficient energy release and high safety levels. However, the intrinsic mechanism of this effect remains unclear. This study focuses on pyrolysis as a prerequisite behavior for energy release, employing reactive molecular dynamics simulations to investigate the pyrolysis of TATB nanoparticles with different sizes, aiming to explore the qualitative changes in thermal properties at the atomic level. Results demonstrate that with increasing particle size, the decomposition rate of TATB decreases. Smaller particles exhibit a propensity towards dehydrogenation and C-NO2 bond cleavage reactions. However, larger nano-TATB particles demonstrate a preference for dimerization, which results in the formation of clusters with greater polymerization and increased stability. The highly polymerized clusters are stable under thermal stimulation, inhibiting further decomposition of TATB. These insights reveal the mechanism underlying the qualitative change in the energy performance of TATB nanoparticles at the atomic level.
Article
Full-text available
Machine learning models are changing the paradigm of molecular modeling, which is a fundamental tool for material science, chemistry, and computational biology. Of particular interest is the inter-atomic potential energy surface (PES). Here we develop Deep Potential-Smooth Edition (DeepPot-SE), an end-to-end machine learning-based PES model, which is able to efficiently represent the PES of a wide variety of systems with the accuracy of ab initio quantum mechanics models. By construction, DeepPot-SE is extensive and continuously differentiable, scales linearly with system size, and preserves all the natural symmetries of the system. Further, we show that DeepPot-SE describes finite and extended systems including organic molecules, metals, semiconductors, and insulators with high fidelity.
Article
Full-text available
Extensions and improvements of empirical force fields are discussed in view of applications to computational vibrational spectroscopy and reactive molecular dynamics simulations. Particular focus is on quantitative studies, which make contact with experiments and provide complementary information for a molecular-level understanding of processes in the gas phase and in solution. Methods range from including multipolar charge distributions to reproducing kernel Hilbert space approaches and machine learned energy functions based on neural networks.
Article
Full-text available
Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry.
Article
Full-text available
Excited-state dynamics simulations are a powerful tool to investigate photo-induced reactions of molecules and materials and provide complementary information to experiments. Since the applicability of these simulation techniques is limited by the costs of the underlying electronic structure calculations, we develop and assess different machine learning models for this task. The machine learning models are trained on ab initio calculations for excited electronic states, using the methylenimmonium cation (CH 2 NH 2 + ) as a model system. Two distinct strategies for modeling excited state properties are tested in this work. The first strategy is to treat each state separately in a kernel ridge regression model and all states together in a multiclass neural network. The second strategy is to instead encode the state as input into the model, which is tested with both models. Numerical evidence suggests that using the state as input yields the best performance. An important goal for excited-state machine learning models is their use in dynamics simulations, which needs not only state-specific information but also couplings, i.e. properties involving pairs of states. Accordingly, we investigate how well machine learning models can predict the couplings. Furthermore, we explore how combining all properties in a single neural network affects the accuracy. Finally, machine learning predicted energies, forces, and couplings are used to carry out excited-state dynamics simulations. Results demonstrate the scopes and possibilities of machine learning to model excited-state properties.
Article
Full-text available
We introduce the FCHL19 representation for atomic environments in molecules or condensed-phase systems. Machine learning models based on FCHL19 are able to yield predictions of atomic forces and energies of query compounds with chemical accuracy on the scale of milliseconds. FCHL19 is a revision of our previous work [F. A. Faber et al., J. Chem. Phys. 148, 241717 (2018)] where the representation is discretized and the individual features are rigorously optimized using Monte Carlo optimization. Combined with a Gaussian kernel function that incorporates elemental screening, chemical accuracy is reached for energy learning on the QM7b and QM9 datasets after training for minutes and hours, respectively. The model also shows good performance for non-bonded interactions in the condensed phase for a set of water clusters with a mean absolute error (MAE) binding energy error of less than 0.1 kcal/mol/molecule after training on 3200 samples. For force learning on the MD17 dataset, our optimized model similarly displays state-of-the-art accuracy with a regressor based on Gaussian process regression. When the revised FCHL19 representation is combined with the operator quantum machine learning regressor, forces and energies can be predicted in only a few milliseconds per atom. The model presented herein is fast and lightweight enough for use in general chemistry problems as well as molecular dynamics simulations.
Article
Full-text available
Glucose pyrolysis, a model system in biomass utilization, is renowned for its great complexity, deep in reaction network hierarchy and rich in reaction patterns. The selectivity in glucose pyrolysis, e.g., the high yield of 5-hydroxymethylfurfural (HMF), a value-added platform product, remains an intriguing puzzle even after 60 years of experimental study. Here we resolve the whole reaction network of glucose pyrolysis using a global-to-global technique for reaction pathway sampling. This is achieved by establishing the first organic chemistry reaction database via stochastic surface walking (SSW) global optimization, building the global neural network (G-NN) potential via machine learning and extensively exploring the reaction network of glucose pyrolysis. In total, 6407 elementary reactions, screened out from more than 150 000 reaction pairs in glucose pyrolysis, are collected in our reaction database. The established reaction network from SSW-NN, further validated by first-principles calculations, reveals that for glucose to HMF, the lowest energy reaction pathway involves fructose and 3-deoxyglucos-2-ene (3-DGE) as key intermediates and a site-selective reaction type, retro-Michael-addition, for three consecutive dehydration steps. The overall barrier is determined to be 1.91 eV, being at least 0.19 eV lower than all previously proposed mechanisms, which assumes direct β-H elimination dehydration. The lowest pathways to the other two major products, furfural (FF) and hydroxyacetaldehyde (HAA), are also discovered with a similar barrier 1.95 eV, which exhibit a competing nature by sharing the same key intermediate, 3-ketohexose. Since chemical reactions occurring in fast glucose pyrolysis are generally present in biomass chemistry, containing essentially all reaction patterns of C-H-O elements, the methodology designed and the results presented would help to advance reaction design and mechanistic modeling in renewable fuels from biomass.
Article
A thorough understanding of the kinetics and dynamics of combusting mixtures is of considerable interest, especially in regimes beyond the reach of current experimental validation. The ReaxFF reactive force field method has provided a way to simulate large-scale systems of hydrogen combustion via a parameterized potential that can simulate bond breaking. This modeling approach has been applied to hydrogen combustion, as well as myriad other reactive chemical systems. In this work, we benchmark the performance of several common parameterizations of this potential against higher-level quantum mechanical (QM) approaches. We demonstrate instances where these parameterizations of the ReaxFF potential fail both quantitatively and qualitatively to describe reactive events relevant for hydrogen combustion systems.
Article
The recombination dynamics of 3P oxygen atoms on cold amorphous solid water to form triplet and singlet molecular oxygen (O2) is investigated under conditions representative of cold clouds. Reactive molecular dynamics simulations including Landau-Zener-based hopping to account for nonadiabatic transitions find that both ground-state (X3Σ g -) O2 and molecular oxygen in the two lowest singlet states (a1Δ g and b1Σ g +) can be formed and the molecular species stabilize through vibrational relaxation. The relative populations of the species are approximately 1:1:1. These results also agree qualitatively with a kinetic model based on simplified wavepacket simulations. The presence and stabilization of higher electronic states of O2 are expected to modify the chemical evolution of cold interstellar (T ∼ 10-50 K) and warmer noctilucent (T ∼ 100 K) clouds.
Article
In recent years, promising deep learning based interatomic potential energy surface (PES) models have been proposed that can potentially allow us to perform molecular dynamics simulations for large scale systems with quantum accuracy. However, making these models truly reliable and practically useful is still a very non-trivial task. A key component in this task is the generation of datasets used in model training. In this paper, we introduce the Deep Potential GENerator (DP-GEN), an open-source software platform that implements the recently proposed ”on-the-fly” learning procedure (Zhang et al. 2019) and is capable of generating uniformly accurate deep learning based PES models in a way that minimizes human intervention and the computational cost for data generation and model training. DP-GEN automatically and iteratively performs three steps: exploration, labeling, and training. It supports various popular packages for these three steps: LAMMPS for exploration, Quantum Espresso, VASP, CP2K, etc. for labeling, and DeePMD-kit for training. It also allows automatic job submission and result collection on different types of machines, such as high performance clusters and cloud machines, and is adaptive to different job management tools, including Slurm, PBS, and LSF. As a concrete example, we illustrate the details of the process for generating a general-purpose PES model for Cu using DP-GEN. Program summary Program Title: DP-GEN Program Files doi: http://dx.doi.org/10.17632/sxybkgc5xc.1 Licensing provisions: LGPL Programming language: Python Nature of problem: Generating reliable deep learning based potential energy models with minimal human intervention and computational cost. Solution method: The concurrent learning scheme is implemented. Supports for sampling configuration space with LAMMPS, generating ab initio data with Quantum Espresso, VASP, CP2K and training potential models with DeePMD-kit are provided. Supports for different machines including workstations, high performance clusters and cloud machines are provided. Supports for job management tools including Slurm, PBS, LSF are provided.