## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

To read the full-text of this research,

you can request a copy directly from the authors.

... To this end, several groups have employed delta (Δ) machine learning (ML) potentials, primarily based on neural networks (NN), as a means to directly correct SE/MM potential energy for improved QM/MM simulations. [21][22][23][24][25] Despite the overall similarity in the Δlearning theme following the work of Ramakrishan et al., 26 these works differ in their choice of base levels, NN potentials' descriptors and topological features, and loss function construction. Building on their earlier work using NN potentials for free energy perturbation, 27 Yang and co-workers have developed a method for training artificial neural networks (ANN) to correct DFTB/MM to the DFT/MM level using energy-only-based loss functions during MD simulations. ...

... 21 Since force matching (FM) can greatly improve the phase-space overlap involved and therefore the quality of dynamics, 17,19 it is generally desirable to include forces directly in the loss function when training the NN potentials. [22][23][24][25] For example, Böselt et al. 22 used symmetry-function descriptors as input features to train their high-dimensional neural network potentials (HDNNP), based on both energy and forces, for predicting corrections for DFTB/MM in simulating stable and transition-state species in solution. Pan et al. used FM-recalibrated SE method to ensure that the training configurations are sampled in the relevant phase space. ...

... 23 Based on the DeepPot-SE 28 and the DeepMD framework, 29,30 their ML model uses an embedding network to encode input features and incorporates both energy and force differences into the loss function to ensure accurate dynamics for MD-based free energy simulations of solution-phase and enzyme reactions. 23 Other recent QM/MM developments utilizing DeepPot and local environment descriptors include the range-corrected deep learning scheme of Zeng et al. 24 and the DFTB/MM-based ML model of Gomez-Flores et al. 25 One notable issue with conventional NN potentials is that they generally do not provide a metric to assess the uncertainties in energy and force predictions. For MD-based QM/MM simulations, in particular, free energy path simulations, this poses a question on how to maintain the robustness of NN models when the trajectories sampled on the NN-predicted PES deviate from the original training configuration space. ...

Free energy simulations that employ combined quantum mechanical and molecular mechanical (QM/MM) potentials at ab initio QM (AI) levels are computationally highly demanding. Here, we present a machine-learning-facilitated approach for obtaining AI/MM-quality free energy profiles at the cost of efficient semiempirical QM/MM (SE/MM) methods. Specifically, we use Gaussian process regression (GPR) to learn the potential energy corrections needed for an SE/MM level to match an AI/MM target along the minimum free energy path (MFEP). Force modification using gradients of the GPR potential allows us to improve configurational sampling and update the MFEP. To adaptively train our model, we further employ the sparse variational GP (SVGP) and streaming sparse GPR (SSGPR) methods, which efficiently incorporate previous sample information without significantly increasing the training data size. We applied the QM-(SS)GPR/MM method to the solution-phase SN2 Menshutkin reaction, NH3+CH3Cl→CH3NH3++Cl−, using AM1/MM and B3LYP/6-31+G(d,p)/MM as the base and target levels, respectively. For 4000 configurations sampled along the MFEP, the iteratively optimized AM1-SSGPR-4/MM model reduces the energy error in AM1/MM from 18.2 to 4.4 kcal/mol. Although not explicitly fitting forces, our method also reduces the key internal force errors from 25.5 to 11.1 kcal/mol/Å and from 30.2 to 10.3 kcal/mol/Å for the N–C and C–Cl bonds, respectively. Compared to the uncorrected simulations, the AM1-SSGPR-4/MM method lowers the predicted free energy barrier from 28.7 to 11.7 kcal/mol and decreases the reaction free energy from −12.4 to −41.9 kcal/mol, bringing these results into closer agreement with their AI/MM and experimental benchmarks.

... The package was first released in 2017 29 and has since undergone rapid development with contributions from many developers. The DeePMD-kit implements a series of MLP models known as Deep Potential (DP) models, 9,10,[50][51][52][53][54] which have been widely adopted in the fields of physics, chemistry, biology, and material science for studying a broad range of atomistic systems. These systems include metallic materials, 55 non-metallic inorganic materials, [56][57][58][59][60] water, [61][62][63][64][65][66][67][68][69][70][71] organic systems, 10,72 solutions, 52,73-76 gasphase systems, [77][78][79][80] macromolecular systems, 81,82 and interfaces. ...

... This limits the predictive capability of the methods in condensed-phase simulations. Zeng et al. 52 created a new Δ-MLP method called Deep Potential-Range correction (DPRc) to integrate with combined quantum mechanical/molecular mechanical (QM/MM) potentials, which corrects the potential energy from a fast, linear-scaling low-level semiempirical QM/MM theory to a high-level ab initio QM/MM theory. Unlike many of the emerging Δ-MLPs that correct internal QM energy and forces, the DPRc model corrects both the QM-QM and QM-MM interactions of a QM/MM calculation in a manner that conserves energy as MM atoms enter (or leave) the vicinity of the QM region. ...

... 80 Compared to its initial release, 29 DeePMD-kit has evolved significantly, with the current version (v2.2.1) offering an extensive range of features. These include DeepPot-SE, attentionbased, and hybrid descriptors, 10,50,51,53 the ability to fit tensorial properties, 105,106 type embedding, model deviation, 103,107 Deep Potential-Range Correction (DPRc), 52,75 Deep Potential Long Range (DPLR), 53 graphics processing unit (GPU) support for customized operators, 108 model compression, 109 non-von Neumann molecular dynamics (NVNMD), 110 and various usability improvements, such as documentation, compiled binary packages, graphical user interfaces (GUIs), and application programming interfaces (APIs). This article provides an overview of the current major version of the DeePMD-kit, highlighting its features and technical details, presenting a comprehensive procedure for conducting molecular dynamics as a representative application, benchmarking the accuracy and efficiency of different models, and discussing ongoing developments. ...

DeePMD-kit is a powerful open-source software package that facilitates molecular dynamics simulations using machine learning potentials known as Deep Potential (DP) models. This package, which was released in 2017, has been widely used in the fields of physics, chemistry, biology, and material science for studying atomistic systems. The current version of DeePMD-kit offers numerous advanced features, such as DeepPot-SE, attention-based and hybrid descriptors, the ability to fit tensorial properties, type embedding, model deviation, DP-range correction, DP long range, graphics processing unit support for customized operators, model compression, non-von Neumann molecular dynamics, and improved usability, including documentation, compiled binary packages, graphical user interfaces, and application programming interfaces. This article presents an overview of the current major version of the DeePMD-kit package, highlighting its features and technical details. Additionally, this article presents a comprehensive procedure for conducting molecular dynamics as a representative application, benchmarks the accuracy and efficiency of different models, and discusses ongoing developments.

... The package was first released in 2017 19 and has since undergone rapid development with contributions from many developers. DeePMD-kit implements a series of MLP models known as Deep Potential (DP) models, 9,10,[41][42][43][44][45] which have been widely adopted in the fields of physics, chemistry, biology, and material science for studying a broad range of atomistic systems. These systems include metallic materials [46][47][48][49][50][51][52][53][54][55][56][57][58][59][60][61] , non-metallic inorganic materials [62][63][64][65][66] , water 67-77 , organic systems, 10,78 solutions 43,79-82 , gas-phase systems [83][84][85][86] , macromolecular systems, 87,88 and interfaces [89][90][91][92][93] . ...

... Compared to its initial release 19 , DeePMD-kit has evolved significantly, with the current version (v2.2.1) offering an extensive range of features. These include DeepPot-SE, attentionbased, and hybrid descriptors 10,41,42,44 , the ability to fit tensorial properties 97,98 , type embedding, model deviation 99,100 , Deep Potential -Range Correction (DPRc) 43,81 , Deep Potential Long Range (DPLR) 44 , graphics processing unit (GPU) support for customized operators 101 , model compression 102 , non-von Neumann molecular dynamics (NVNMD) 103 , and various usability improvements such as documentation, compiled binary packages, graphical user interfaces (GUI), and application programming interfaces (API). This article provides an overview of the current major additions to the DeePMD-kit, highlighting its features and technical details, benchmarking the accuracy and efficiency of different models, and dis- cussing ongoing developments. ...

... Deep Potential -Range Correction (DPRc) 43,81 was initially designed to correct the potential energy from a fast, linear-scaling low-level semiempirical QM/MM theory to a highlevel ab initio QM/MM theory in a range-correction way to quantitatively correct short and mid-range non-bonded interactions leveraging the non-bonded lists routinely used in molecular dynamics simulations using molecular mechanical force fields such as AMBER. 108 In this way, long-ranged electrostatic interactions can be modeled efficiently using the particle mesh Ewald method 108 or its extensions for multipolar 109,110 and QM/MM 111,112 potentials. ...

DeePMD-kit is a powerful open-source software package that facilitates molecular dynamics simulations using machine learning potentials (MLP) known as Deep Potential (DP) models. This package, which was released in 2017, has been widely used in the fields of physics, chemistry, biology, and material science for studying atomistic systems. The current version of DeePMD-kit offers numerous advanced features such as DeepPot-SE, attention-based and hybrid descriptors, the ability to fit tensile properties, type embedding, model deviation, Deep Potential - Range Correction (DPRc), Deep Potential Long Range (DPLR), GPU support for customized operators, model compression, non-von Neumann molecular dynamics (NVNMD), and improved usability, including documentation, compiled binary packages, graphical user interfaces (GUI), and application programming interfaces (API). This article presents an overview of the current major version of the DeePMD-kit package, highlighting its features and technical details. Additionally, the article benchmarks the accuracy and efficiency of different models and discusses ongoing developments.

... In addition to enabling quantum mechanical accuracy, current DP models have the following characteristics: (i) preserving the symmetry of the system, especially when there are multiple elemental species; (ii) having high computational efficiency, being at least five orders of magnitude faster than DFT; (iii) being end-to-end and therefore having little human intervention; (iv) supporting MPI and GPU, making it highly efficient on modern heterogeneous high-performance supercomputers. Thanks to these points, the DP models have been successfully employed in studies of water and water-containing systems (Calegari Sommers et al., 2020;Xu et al., 2020;and Tisi et al., 2021), metals and alloys Wang et al., 2020;Jiang et al., 2021;, phase diagrams (Niu et al., 2020;Yang et al., 2021;, high-entropy ceramics (Dai et al., 2020(Dai et al., , 2021, chemical reaction (Zeng et al., , 2021a(Zeng et al., , 2021b, solid-state electrolytes , ionic liquids (Liang et al., 2021), etc. We refer to Wen et al. (2022) for a recent review of DP for material systems. ...

... In this section, we briefly introduce some recent progress on biocatalysis simulations that require a more delicate treatment on the DeePMD-kit. We refer to Zeng et al. (2021a) for more details. ...

... In order to tackle these biological problems, recently, the DeePMD-kit was integrated with the AMBER molecular simulation software suite (Case et al., 2020) in order to perform biocatalysis simulations (Zeng et al., 2021a). This interface enables ab initio combined quantum mechanical/molecular mechanical (QM/MM) simulations with a rigorous treatment of long-ranged electrostatic interactions under periodic boundary conditions (Giese and York, 2016;and Pan et al., 2021), and affords a powerful tool to gain insight into the pathways of biocatalysis reactions, their transition states and intermediates, and environmental factors that modulate reactivity (Gaines et al., 2019;and Ganguly et al., 2020). ...

DESCRIPTION
Coarse-grained (CG) molecular dynamics simulations of integral membrane proteins have gained wide popularity because they provide a cost-effective but still accurate description of the protein-membrane interactions as a whole and on the role of individual lipidic species. Therefore, they can provide biologically meaningful information at a resolution comparable to those accessible to experimental techniques. However, the simulation of membrane proteins remains a challenging task that requires specific expertise, as external pressures and solvation need to be carefully handled. CG simulations that lump several water molecules into one single supramolecular moiety may present further intricacies due to bulkier solvent representations or model-dependent compressibilities. This chapter provides a detailed protocol for setting up, running, and analyzing CG simulations of membrane proteins using the SIRAH force field for CG simulations within the AMBER package.

... In addition to enabling quantum mechanical accuracy, current DP models have the following characteristics: (i) preserving the symmetry of the system, especially when there are multiple elemental species; (ii) having high computational efficiency, being at least five orders of magnitude faster than DFT; (iii) being end-to-end and therefore having little human intervention; (iv) supporting MPI and GPU, making it highly efficient on modern heterogeneous high-performance supercomputers. Thanks to these points, the DP models have been successfully employed in studies of water and water-containing systems (Calegari Sommers et al., 2020;Xu et al., 2020;and Tisi et al., 2021), metals and alloys (Zhang et al., 2019;Wang et al., 2020;Jiang et al., 2021;, phase diagrams (Niu et al., 2020;Yang et al., 2021;, high-entropy ceramics (Dai et al., 2020(Dai et al., , 2021, chemical reaction (Zeng et al., , 2021a(Zeng et al., , 2021b, solid-state electrolytes (Huang et al., 2021), ionic liquids (Liang et al., 2021), etc. We refer to Wen et al. (2022) for a recent review of DP for material systems. ...

... In this section, we briefly introduce some recent progress on biocatalysis simulations that require a more delicate treatment on the DeePMD-kit. We refer to Zeng et al. (2021a) for more details. ...

... In order to tackle these biological problems, recently, the DeePMD-kit was integrated with the AMBER molecular simulation software suite (Case et al., 2020) in order to perform biocatalysis simulations (Zeng et al., 2021a). This interface enables ab initio combined quantum mechanical/molecular mechanical (QM/MM) simulations with a rigorous treatment of long-ranged electrostatic interactions under periodic boundary conditions (Giese and York, 2016;and Pan et al., 2021), and affords a powerful tool to gain insight into the pathways of biocatalysis reactions, their transition states and intermediates, and environmental factors that modulate reactivity (Gaines et al., 2019;and Ganguly et al., 2020). ...

DESCRIPTION
A new direction has emerged in molecular simulations in recent years, where potential energy surfaces (PES) are constructed using machine learning (ML) methods. These ML models, combining the accuracy of quantum mechanical models and the efficiency of empirical atomic potential models, have been demonstrated by many studies to have extensive application prospects. This chapter introduces a recently developed ML model, Deep Potential (DP), and the corresponding package, DeePMD-kit. First, we present the basic theory of the DP method. Then, we show how to train and test a DP model for a gas-phase methane molecule using the DeePMD-kit package. Next, we introduce some recent progress on simulations of biomolecular processes by integrating the DeePMD-kit with the AMBER molecular simulation software suite. Finally, we provide a supplement on points that require further explanation.

... 46 In the present work, we develop a Quantum Deep-learning Potential Interaction (QDπ) model that uses a fast third-order self-consistent density-functional tight-binding (DFTB3/3OB) model 47,48 that is corrected to a quantitatively high-level of accuracy through a range-corrected deep-learning potential (DPRc). 49,50 In this way, the QDπ model developed here is the form of a quantum mechanical/machine learning potential correction (QM/Δ-MLP). 35,49−54 The use of DFTB3 as a robust QM base model has several important advantages. ...

... The QDπ model is trained to be a QM/Δ-MLP; i.e., a nonelectronic DPRc "correction" to the DFTB3/3OB 75 QM model potential energy similar to previous work. 49,50 2.1.1. Broad Data Sets: ANI-1xm and COMP5m. ...

... The next step of future work will involve developing an intermolecular QM/MM interaction potential as a new rangecorrected deep-learning potential. 49,50 The full (internal and intermolecular interaction) QDπ model is designed to be a correction to the QM/MM potential energy using DFTB3/ 3OB and the latest AMBER FF19SB for proteins, 126 OL3/ OL15 for nucleic acids, 127−129 OPC model for water, 130,131 and 12−6−4 ion models. 132−134 Once the intermolecular interaction component of the QDπ model has been developed and validated in alchemical free energy simulations, 5 next steps will be to extend the chemical space of drug molecules to include P, S, F, and Cl atoms. ...

We report QDπ-v1.0 for modeling the internal energy of drug molecules containing H, C, N, and O atoms. The QDπ model is in the form of a quantum mechanical/machine learning potential correction (QM/Δ-MLP) that uses a fast third-order self-consistent density-functional tight-binding (DFTB3/3OB) model that is corrected to a quantitatively high-level of accuracy through a deep-learning potential (DeepPot-SE). The model has the advantage that it is able to properly treat electrostatic interactions and handle changes in charge/protonation states. The model is trained against reference data computed at the ωB97X/6-31G* level (as in the ANI-1x data set) and compared to several other approximate semiempirical and machine learning potentials (ANI-1x, ANI-2x, DFTB3, MNDO/d, AM1, PM6, GFN1-xTB, and GFN2-xTB). The QDπ model is demonstrated to be accurate for a wide range of intra- and intermolecular interactions (despite its intended use as an internal energy model) and has shown to perform exceptionally well for relative protonation/deprotonation energies and tautomers. An example application to model reactions involved in RNA strand cleavage catalyzed by protein and nucleic acid enzymes illustrates QDπ has average errors less than 0.5 kcal/mol, whereas the other models compared have errors over an order of magnitude greater. Taken together, this makes QDπ highly attractive as a potential force field model for drug discovery.

... 47,[60][61][62][63][64][65][66][67][68][69][70] A MLP-corrected semiempirical QM/MM model naturally serves as an excellent reference potential to estimate the ab initio FES; however, the active learning procedure used to train MLPs produces several neural network parameter sets (several potentials). 71,72 The gwTP method provides a means to estimate the ab initio FES from the aggregate sampling performed with each potential. As multiple independent simulation runs are typically performed in order to produce robust averages and error estimates, use of different reference potentials can often be accommodated for no added computational cost. ...

... For the purpose of providing a stringent test case for demonstration, the 4 reference potentials were specifically designed such that none of them accurately reproduce the target FES throughout the entire range of ξ PT values. The reference potentials use the MNDO/d semiempirical Hamiltonian supplemented with a range-corrected deep potential 71,83 (DPRc) MLP. We trained 4 ad hoc MNDO/d QM/MM+DPRc potentials using different target data to yield significantly different reference potentials. ...

... The optimization consisted of 200k steps with initial and final learning rates of 10 −3 and 5·10 −8 respectively. The initial optimization was followed by 9 cycles of active learning to search for additional training data.71 Each active learning cycle performs 4 parameter optimizations to yield 4 trial parameter sets. ...

We describe the generalized weighted thermodynamic perturbation (gwTP) method for estimating the free energy surface of an expensive "high-level" potential energy function from the umbrella sampling performed with multiple inexpensive "low-level" reference potentials. The gwTP method is a generalization of the weighted thermodynamic perturbation (wTP) method developed by Li and co-workers [J. Chem. Theory Comput. 2018, 14, 5583-5596] that uses a single "low-level" reference potential. The gwTP method offers new possibilities in model design whereby the sampling generated from several low-level potentials may be combined (e.g., specific reaction parameter models that might have variable accuracy at different stages of a multistep reaction). The gwTP method is especially well suited for use with machine learning potentials (MLPs) that are trained against computationally expensive ab initio quantum mechanical/molecular mechanical (QM/MM) energies and forces using active learning procedures that naturally produce multiple distinct neural network potentials. Simulations can be performed with greater sampling using the fast MLPs and then corrected to the ab initio level using gwTP. The capabilities of the gwTP method are demonstrated by creating reference potentials based on the MNDO/d and DFTB2/MIO semiempirical models supplemented with the "range-corrected deep potential" (DPRc). The DPRc parameters are trained to ab initio QM/MM data, and the potentials are used to calculate the free energy surface of stepwise mechanisms for nonenzymatic RNA 2'-O-transesterification model reactions. The extended sampling made possible by the reference potentials allows one to identify unequilibrated portions of the simulations that are not always evident from the short time scale commonly used with ab initio QM/MM potentials. We show that the reference potential approach can yield more accurate ab initio free energy predictions than the wTP method or what can be reasonably afforded from explicit ab initio QM/MM sampling.

... Machine learning potentials offer a potential mechanism to improve the accuracy and efficiency of QM/MM simulations, and they have had considerable impact in the development of methods to study chemical reactions. [15][16][17][18][19][20][21] Herein we develop an approach whereby we employ a recently described deep-potential range correction (DPRc) model 22 to enhance the accuracy of a fast, approximate base QM/MM model to reproduce the energies and forces of a much more computationally costly target QM/MM model. The new model parametrizes the DPRc potential using a machine learning neural network training procedure 22 to correct the 2nd-order density-functional tight-binding (DFTB2) semiempirical method [23][24][25] to reproduce the PBE0/6-31G* energies and forces in explicit solvent QM/MM calculations. ...

... [15][16][17][18][19][20][21] Herein we develop an approach whereby we employ a recently described deep-potential range correction (DPRc) model 22 to enhance the accuracy of a fast, approximate base QM/MM model to reproduce the energies and forces of a much more computationally costly target QM/MM model. The new model parametrizes the DPRc potential using a machine learning neural network training procedure 22 to correct the 2nd-order density-functional tight-binding (DFTB2) semiempirical method [23][24][25] to reproduce the PBE0/6-31G* energies and forces in explicit solvent QM/MM calculations. We describe a framework for introducing nuclear quantum effects into the calculations using path integral molecular dynamics (PIMD) through an interface between AMBER20 26 and i-PI 27 software packages. ...

... 26 The network parameters are optimized using an active learning approach described in detail elsewhere. 22 The parameter optimizations were performed with the DP-GEN software, 59 The active learning procedure is terminated when the 4 parameter sets agree for 99.8% of the frames in the current cycle. The 99.8% termination criteria is an empirical optimization threshold intended to strike a balance between the computational resources required to continue the active learning procedure and the current uncertainty in the model fit. ...

We present a fast, accurate, and robust approach for determination of free energy profiles and kinetic isotope effects for RNA 2'-O-transphosphorylation reactions with inclusion of nuclear quantum effects. We apply a deep potential range correction (DPRc) for combined quantum mechanical/molecular mechanical (QM/MM) simulations of reactions in the condensed phase. The method uses the second-order density-functional tight-binding method (DFTB2) as a fast, approximate base QM model. The DPRc model modifies the DFTB2 QM interactions and applies short-range corrections to the QM/MM interactions to reproduce ab initio DFT (PBE0/6-31G*) QM/MM energies and forces. The DPRc thus enables both QM and QM/MM interactions to be tuned to high accuracy, and the QM/MM corrections are designed to smoothly vanish at a specified cutoff boundary (6 Å in the present work). The computational speed-up afforded by the QM/MM+DPRc model enables free energy profiles to be calculated that include rigorous long-range QM/MM interactions under periodic boundary conditions and nuclear quantum effects through a path integral approach using a new interface between the AMBER and i-PI software. The approach is demonstrated through the calculation of free energy profiles of a native RNA cleavage model reaction and reactions involving thio-substitutions, which are important experimental probes of the mechanism. The DFTB2+DPRc QM/MM free energy surfaces agree very closely with the PBE0/6-31G* QM/MM results, and it is vastly superior to the DFTB2 QM/MM surfaces with and without weighted thermodynamic perturbation corrections. 18O and 34S primary kinetic isotope effects are compared, and the influence of nuclear quantum effects on the free energy profiles is examined.

... Either non-local information transfer is incorporated directly in the ML models 30,32,26,31 or ML methods are combined with a fast or semi-empirical QM method with an explicit treatment of long-range interactions in a ∆-learning scheme. 33,34,35,36 The ∆-learning approach has already been shown to be promising for (QM)ML/MM MD simulations of condensed-phase systems. 34,35,36 Next to the descriptor-based approaches, there has also been a lot of development on message passing approaches trained to reproduce the PES of QM systems. ...

... 33,34,35,36 The ∆-learning approach has already been shown to be promising for (QM)ML/MM MD simulations of condensed-phase systems. 34,35,36 Next to the descriptor-based approaches, there has also been a lot of development on message passing approaches trained to reproduce the PES of QM systems. 22,23,37,38,39 These graph networks typically use dense layers of neural networks as non-linear functions for the message passing convolutions and are thus known as graph-convolutional neural networks (GCNNs). ...

To accurately study chemical reactions in the condensed phase or within enzymes, both a quantum-mechanical description and sufficient configurational sampling is required to reach converged estimates. Here, quantum mechanics/molecular mechanics (QM/MM) molecular dynamics (MD) simulations play an important role, providing QM accuracy for the region of interest at a decreased computational cost. However, QM/MM simulations are still too expensive to study large systems on longer time scales. Recently, machine learning (ML) models have been proposed to replace the QM description. The main limitation of these models lies in the accurate description of long-range interactions present in condensed-phase systems. To overcome this issue, a recent workflow has been introduced combining a semi-empirical method (i.e. density functional tight binding (DFTB)) and a high-dimensional neural network potential (HDNNP) in a $\Delta$-learning scheme. This approach has been shown to be capable of correctly incorporating long-range interactions within a cutoff of 1.4 nm. One of the promising alternative approaches to efficiently take long-range effects into account is the development of graph convolutional neural networks (GCNN) for the prediction of the potential-energy surface. In this work, we investigate the use of GCNN models -- with and without a $\Delta$-learning scheme -- for (QM)ML/MM MD simulations. We show that the $\Delta$-learning approach using a GCNN and DFTB and as baseline achieves competitive performance on our benchmarking set of solutes and chemical reactions in water. The method is additionally validated by performing prospective (QM)ML/MM MD simulations of retinoic acid in water and S-adenoslymethioniat interacting with cytosine in water. The results indicate that the $\Delta$-learning GCNN model is a valuable alternative for (QM)ML/MM MD simulations of condensed-phase systems.

... It is not practical to incorporate all MM atoms (in addition to QM atoms) in the training of these potentials, as this would lead to an explosively large array of descriptors, the most straightforward way is to include only MM atoms within a distance cutoff from the QM region in the MLP training. [64][65][66] Alternatively, one can adopt an implicit description of the MM environment through the use of MM-perturbed semiempirical QM charges, 67,68 MM electrostatic potential or field at QM atom positions, 69,70 or through polarizable embedding. 71 One can also use both MM electrostatic potential and field in the training of QM/MM MLPs 72,73 using our QM/MM-AC scheme 74 for separating inner and outer MM atoms and projecting outer MM charges onto inner MM atom positions. ...

In the last several years, there has been a surge in the development of machine learning potential (MLP) models for describing molecular systems. We are interested in a particular area of this field — the training of system-specific MLPs for reactive systems — with the goal of using these MLPs to accelerate free energy simulations of chemical and enzyme reactions. To help new members in our labs become familiar with the basic techniques, we have put together a self-guided Colab tutorial (https://cc-ats.github.io/mlp_tutorial/), which we expect to be also useful to other young researchers in the community. Our tutorial begins with the introduction of simple fitting neural network (FNN) and kernel-based (using Gaussian Process Regression, GPR) models by fitting the two-dimensional Müller-Brown potential. Subsequently, two simple descriptors are presented for extracting features of molecular systems: symmetry functions (including the ANI variant) and embedding neural networks (such as DeepPot-SE). Lastly, these features will be fed into FNN and GPR models to reproduce the energy/force of molecular configurations of the Claisen rearrangement.

... The Δ-learning approach has been reported to accelerate the MD simulations for thermal reactions (e.g., SN2 reaction and Claisen rearrangement) using the HDNNP 147-149 and DeepPot-SE. 150 On the other hand, the ML potential, such as HDNNP, 151 FieldSchNet, 152 and Deep-Pot-SE 153 have been adapted to a QM calculator that predicts the energies and forces at the same level as the QM training data. These methods belong to the electronic embedding ML potential. ...

Machine learning (ML) continues to revolutionize computational chemistry by accelerating predictions and simulations by training on experimental or accurate but expensive quantum chemical (QC) calculations. Photodynamics simulations require hundreds of trajectories coupled with multiconfigurational QC calculations of energies, forces, and non-adiabatic couplings that contribute to the prohibitive computational cost at long timescales and complex organic molecules. ML accelerates photodynamics simulations by combining nonadiabatic photodynamics simulations with an ML model trained with high-fidelity QC calculations of energies, forces, and non-adiabatic couplings. This approach has provided time-dependent molecular structural information for understanding photochemical reaction mechanisms of organic reactions in vacuum and complex environments (i.e., explicit solvation). This review focuses on the fundamentals of QC calculations and machine learning techniques. We then discuss the strategies to balance adequate training data and the computational cost of generating these training data. Finally, we demonstrate the power of applying these ML-photodynamics simulations to understand the origin of reactivities and selectivities of organic photochemical reactions, such as cis-trans isomerization, [2+2]-cycloaddition, 4π-electrostatic ring-closing, and hydrogen roaming mechanism.

... We previously examined the six nonenzymatic phosphoryl transfer models in Ref. 14 to train a deep potential range corrected (DPRc) machine learning potential (Δ-MLP) 74 that supplements the second order density functional tight binding [75][76][77] (DFTB2) quantum mechanical (QM)/molecular mechanical (MM) Hamiltonian with a nonelectronic neural network correction parametrized to reproduce PBE0/6-31G * QM/MM energies and forces. The DFTB2 model is evaluated with the MIO parameter set and referred to as DFTB2/MIO. ...

We use the modified Bigeleisen–Mayer equation to compute kinetic isotope effect values for non-enzymatic phosphoryl transfer reactions from classical and path integral molecular dynamics umbrella sampling. The modified form of the Bigeleisen–Mayer equation consists of a ratio of imaginary mode vibrational frequencies and a contribution arising from the isotopic substitution’s effect on the activation free energy, which can be computed from path integral simulation. In the present study, we describe a practical method for estimating the frequency ratio correction directly from umbrella sampling in a manner that does not require normal mode analysis of many geometry optimized structures. Instead, the method relates the frequency ratio to the change in the mass weighted coordinate representation of the minimum free energy path at the transition state induced by isotopic substitution. The method is applied to the calculation of 16/18O and 32/34S primary kinetic isotope effect values for six non-enzymatic phosphoryl transfer reactions. We demonstrate that the results are consistent with the analysis of geometry optimized transition state ensembles using the traditional Bigeleisen–Mayer equation. The method thus presents a new practical tool to enable facile calculation of kinetic isotope effect values for complex chemical reactions in the condensed phase.

... Here, we restate the relevant theory in the frequency shift calculation task context. More details about DPRc for QM/MM can be found in the work of Zeng et al.27 ...

Vibrational spectrum simulation, as an ensemble average result, can be very time consuming when using high accuracy methods. Here, we introduce a new machine learning approach based on the range corrected deep potential (DPRc) model to improve computing efficiency. The approach was applied to computing \ch{C=O} stretching vibrational frequency shifts of formic acid-water solution. DPRc is adapted for frequency shift calculation. The system was divided into ``probe region'' and ``solvent region'' by atom. Three kinds of ``probe region'' were tested: single atom with atomic contribution correction, a single atom, and a single molecule. All data sets were prepared using by Quantum Vibration Perturbation (QVP) approach. The deep potential (DP) model was also adapted for frequency shift calculation for comparison, and different interaction cut-off radii were tested. The single molecule ``probe region'' results show the best accuracy, running roughly ten times faster than regular DP, while reducing the training time by a factor of about four, making it fully applicable in practice. The results show that dropping information of interaction distances between solvent atoms can significantly increase computing and training efficiency while ensuring little loss of accuracy. The protocol is practical, easy to apply, and extendable to calculating other physical quantities.

... One path forward that appears promising is to use machine-learning potentials (MLPs) either as stand-alone alternative models [39][40][41][42][43][44] , or else to augment existing semiempirical QM methods. [45][46][47][48][49][50][51] We will refer to the former class as "pure MLPs" and the latter class as "QM/∆-MLPs". MLPs have emerged as powerful tools to enable fast and accurate chemical models within the scope of their training 39,[41][42][43][44] . ...

Modern semiempirical electronic structure methods have considerable promise in drug discovery as universal "force fields" that can reliably model biological and drug-like molecules. Herein, we compare the performance of several NDDO-based semiempirical (MNDO/d, AM1, PM6 and ODM2), density-functional tight-binding based (DFTB3, GFN1-xTB and GFN2-xTB) models with pure machine learning potentials (ANI-1x and ANI-2x) and hybrid quantum mechanical/machine learning potentials (AIQM1 and QDπ) for a wide range of data computed at a consistent ωB97X/6-31G* level of theory (as in the ANI-1x database). This data includes conformational energies, intermolecular interactions, tautomers, and protonation states. Additional comparisons are made to a set of natural and synthetic nucleic acids from the artificially expanded genetic information system (AEGIS). This dataset has important implications in the design of new biotechnology and therapeutics. Finally, weexamine acid/base chemistry relevant for RNA cleavage reactions catalyzed by small nucleolytic ribozymes and ribonucleases. Overall, the recently developed QDπ model performs exceptionally well across all datasets, having especially high accuracy for tautomers and protonation states relevant to drug discovery.

... In contrast, the development of new QM/MM methods has been relatively slow due to the computer-intensive nature of QM algorithms, while enzymatic functions require extensive sampling of enzyme conformations. However, in recent years, the development of machine learning approaches [101][102][103][104][105] and large-scale high-performance computers utilizing both many-core CPU and GPU architectures has revived efforts to develop fast QM/MM algorithms for rapid prediction of enzyme activity. Overall, the field faces a formidable challenge, and the dissection and understanding of allosteric networks can only be accomplished with massive developments of high-throughput computational and experimental approaches. ...

Biological life depends on motion, and this manifests itself in proteins that display motion over a formidable range of time scales spanning from femtoseconds vibrations of atoms at enzymatic transition states, all the way to slow domain motions occurring on micro to milliseconds. An outstanding challenge in contemporary biophysics and structural biology is a quantitative understanding of the linkages among protein structure, dynamics, and function. These linkages are becoming increasingly explorable due to conceptual and methodological advances. In this Perspective article, we will point toward future directions of the field of protein dynamics with an emphasis on enzymes. Research questions in the field are becoming increasingly complex such as the mechanistic understanding of high-order interaction networks in allosteric signal propagation through a protein matrix, or the connection between local and collective motions. In analogy to the solution to the “protein folding problem,” we argue that the way forward to understanding these and other important questions lies in the successful integration of experiment and computation, while utilizing the present rapid expansion of sequence and structure space. Looking forward, the future is bright, and we are in a period where we are on the doorstep to, at least in part, comprehend the importance of dynamics for biological function.

... Very recently, several approaches have been proposed to tackle this problem. In the DPRc model by Zeng et al. 9 a Δlearning 10 approach is proposed to correct QM/MM potentials to a higher level of theory. This is done by introducing a correction to the interaction energies that smoothly vanishes for MM atoms farther from the QM region. ...

This work presents a variant of an electrostatic embedding scheme that allows the embedding of arbitrary machine learned potentials trained on molecular systems in vacuo. The scheme is based on physically motivated models of electronic density and polarizability, resulting in a generic model without relying on an exhaustive training set. The scheme only requires in vacuo single point QM calculations to provide training densities and molecular dipolar polarizabilities. As an example, the scheme is applied to create an embedding model for the QM7 data set using Gaussian Process Regression with only 445 reference atomic environments. The model was tested on the SARS-CoV-2 protease complex with PF-00835231, resulting in a predicted embedding energy RMSE of 2 kcal/mol, compared to explicit DFT/MM calculations.

... 17−20 Inclusion of the environment effects poses further challenges, and several works have tried to include these effects, either in excited-state properties 21,22 or by developing ground-state QM/MM potentials. 23 −29 In a previous work 30 we have presented a ML approach to estimate excitonic couplings in LHCs with an accuracy comparable to that of the reference time-dependent density functional theory (TD-DFT) calculations while being orders of magnitude faster. In this work we develop a model for estimating site energies, thus providing a ML estimate for the full exciton Hamiltonian. ...

We propose a machine learning (ML)-based strategy for an inexpensive calculation of excitonic properties of light-harvesting complexes (LHCs). The strategy uses classical molecular dynamics simulations of LHCs in their natural environment in combination with ML prediction of the excitonic Hamiltonian of the embedded aggregate of pigments. The proposed ML model can reproduce the effects of geometrical fluctuations together with those due to electrostatic and polarization interactions between the pigments and the protein. The training is performed on the chlorophylls of the major LHC of plants, but we demonstrate that the model is able to extrapolate well beyond the initial training set. Moreover, the accuracy in predicting the effects of the environment is tested on the simulation of the small changes observed in the absorption spectra of the wild-type and a mutant of a minor LHC.

... Semiempirical QM/MM calculations are several orders of magnitude faster, but their accuracy is largely dependent on the quality of the model parameters, for which systematic error estimation is difficult. Recent approaches apply neural network representations [327][328][329][330] and other ML schemes [328,329,331,332] to reduce the computational cost of the QM calculation in the QM/MM frameworks, either by learning the abminitio-QM/MM PES from semiempirical QM/MM PES (denoted ∆ML, which goes back to work of von Lilienfeld and coworkers [333]) or by directly learning the ab initio PES [301,[304][305][306]321], the electron density, [334] or the wave function [335] of the QM subsystem. For instance, Böselt and coworkers applied a neural network representation of the QM region coupled to a ∆ML approach. ...

Hybrid quantum mechanics/molecular mechanics (QM/MM) hybrid models allow one to address chemical phenomena in complex molecular environments. However, they are tedious to construct and they usually require significant manual preprocessing and expertise. As a result, these models may not be easily transferable to new application areas and the many parameters are not easy to adjust to reference data that are typically scarce. Therefore, it has been difficult to devise automated procedures of controllable accuracy, which makes such type of modelling far from being standardized or of black-box type. Although diverse best-practice protocols have been set up for the construction of individual components of a QM/MM model (e.g., the MM potential, the type of embedding, the choice of the QM region), no automated procedures are available for all steps of the QM/MM model construction. Here, we review the state of the art of QM/MM modeling with a focus on automation. We elaborate on the MM model parametrization, on atom-economical physically-motivated QM region selection, and on embedding schemes that incorporate mutual polarization as critical components of the QM/MM model. In view of the broad scope of the field, we mostly restrict the discussion to methodologies that build de novo models based on first-principles data, on uncertainty quantification, and on error mitigation with a high potential for automation. Ultimately, it is desirable to be able to set up reliable QM/MM models in a fast and efficient automated way without being constrained by some specific chemical or technical limitations.

... In DeepPot-SE, the expression for the atomic contribution E i is a neural network consisting of three hidden layers. The input layer is the molecular descriptor D e R i , which determined by the "environment matrix" e R i , the "embedding" matrix G i , and a reduced dimension embedding matrix G < i [64]. ...

Recently, artificial neural network-based methods for the construction of potential energy surfaces and molecular dynamics (MD) simulations based on them have been increasingly used in the field of theoretical chemistry. The neural network potentials (NNP) strike a good balance between accuracy and computational efficiency relative to quantum chemical calculations and MD simulations based on classical force fields. Thus, NNP is becoming a powerful tool for studying the structure and function of molecules. In this chapter, we introduce the basic theory of NNP. The construction steps and the usage of NNP are also introduced in detail with the MD simulation of methane combustion as an example. We hope that this chapter can help those readers who are new but interested in entering this field.

... In state-of-the-art QM/MM methods, the interaction between QM and MM regions is Very recently, several approaches have been proposed to tackle this problem. In DPRc approach by Zeng et al. 9 a ∆-learning 10 approach is proposed to correct QM/MM potentials to higher level of theory. This is done by introducing a correction to the interaction energies that smoothly vanishes for MM atoms farther from the QM region. ...

This work presents a variant of an electrostatic embedding scheme that allows the embedding of arbitrary machine learned potentials trained on molecular systems in vacuo. The scheme is based on physically motivated models of electronic density and polarizability, resulting in a generic model without relying on an exhaustive training set. The scheme only requires in vacuo single point QM calculations to provide training densities and molecular dipolar polarizabilities. As an example, the scheme is applied to create an embedding model for the QM7 dataset using Gaussian Process Regression with only 445 reference atomic environments. The model was tested on SARS-CoV-2 protease complex with PF-00835231, resulting in predicted embedding energy RMSE of 2 kcal/mol, compared to explicit DFT/MM calculations.

... In state-of-the-art QM/MM methods, the interaction between QM and MM regions is Very recently, several approaches have been proposed to tackle this problem. In DPRc approach by Zeng et al. 9 a ∆-learning 10 approach is proposed to correct QM/MM potentials to higher level of theory. This is done by introducing a correction to the interaction energies that smoothly vanishes for MM atoms farther from the QM region. ...

This work presents a variant of an electrostatic embedding scheme that allows the embedding of arbitrary machine learned potentials trained on molecular systems in vacuo. The scheme is based on physically motivated models of electronic density and polarizability, resulting in a generic model without relying on an exhaustive training set. The scheme only requires in vacuo single point QM calculations to provide training densities and molecular dipolar polarizabilities. As an example, the scheme is applied to create an embedding model for the QM7 dataset using Gaussian Process Regression with only 445 reference atomic environments. The model was tested on SARS-CoV-2 protease complex with PF-00835231, resulting in predicted embedding energy RMSE of 2 kcal/mol, compared to explicit DFT/MM calculations.

... In state-of-the-art QM/MM methods, the interaction between QM and MM regions is Very recently, several approaches have been proposed to tackle this problem. In DPRc approach by Zeng et al. 9 a ∆-learning 10 approach is proposed to correct QM/MM potentials to higher level of theory. This is done by introducing a correction to the interaction energies that smoothly vanishes for MM atoms farther from the QM region. ...

This work presents a variant of an electrostatic embedding scheme that allows the embedding of arbitrary machine learned potentials trained on molecular systems in vacuo. The scheme is based on physically motivated models of electronic density and polarizability, resulting in a generic model without relying on an exhaustive training set. The scheme only requires in vacuo single point QM calculations to provide training densities and molecular dipolar polarizabilities. As an example, the scheme is applied to create an embedding model for the QM7 dataset using Gaussian Process Regression with only 445 reference atomic environments. The model was tested on SARS-CoV-2 protease complex with PF-00835231, resulting in predicted embedding energy RMSE of 2 kcal/mol, compared to explicit DFT/MM calculations.

... approaches [71] where the differences between a lower level of QM theory versus that of a higher level of theory is learned for a specific reaction and then applied to correct the cheaper QM models to achieve high quality results. The work of Zeng et al. [65] introduces rangecorrections to the ML potentials to also improve shortrange QM/MM interactions affecting the MM atoms within a relevant cutoff of the QM region. These application studies are interesting and point to the need for improvements in the treatment of long-range interactions and in building standalone 4G potentials that can deliver high-quality results without resorting to D-ML approaches. ...

Quantum chemistry enables to study systems with chemical accuracy (<1 kcal/mol from experiment) but is restricted to a handful of atoms due to its computational expense. This has led to ongoing interest to optimize and simplify these methods while retaining accuracy. Implementing quantum mechanical (QM) methods on modern hardware such as multiple-GPUs is one example of how the field is optimizing performance. Multiscale approaches like the so-called QM/molecular mechanical method are gaining popularity in drug discovery because they focus the application of QM methods on the region of choice (e.g., the binding site), while using efficient MM models to represent less relevant areas. The creation of simplified QM methods is another example, including the use of machine learning to create ultra-fast and accurate QM models. Herein, we summarize recent advancements in the development of optimized QM methods that enhance our ability to use these methods in computer aided drug discovery.

... The introduction of a cutoff, up to which the MM region is included, has emerged as a solution. 21,22,25 One example is FieldSchNet, 20 which circumvents this problem by sampling the environment while keeping the QM region fixed. This model has been shown to be powerful in predicting spectra and chemical reactions with neural networks (NNs) using electrostatic embedding but requires extended sampling. ...

Hybrid quantum mechanics/molecular mechanics (QM/MM) simulations have advanced the field of computational chemistry tremendously. However, they require the partitioning of a system into two different regions that are treated at different levels of theory, which can cause artifacts at the interface. Furthermore, they are still limited by high computational costs of quantum chemical calculations. In this work, we develop the buffer region neural network (BuRNN), an alternative approach to existing QM/MM schemes, which introduces a buffer region that experiences full electronic polarization by the inner QM region to minimize artifacts. The interactions between the QM and the buffer region are described by deep neural networks (NNs), which leads to the high computational efficiency of this hybrid NN/MM scheme while retaining quantum chemical accuracy. We demonstrate the BuRNN approach by performing NN/MM simulations of the hexa-aqua iron complex.

... Many different NNPs have been proposed for water, small organic molecules, and metal materials since Behler and Parrinello proposed the high-dimensional neural network (HDNN) approach. [25][26][27][28][29][30][31][32][33][34][35][36] Recently, Yoo et al. used ReaxFF combined with NNPs to explore the decomposition process of RDX. 37 In this work, accurate NNPs for the pure CL-20 and CL-20/ TNT co-crystal systems will be constructed. ...

CL-20 (2,4,6,8,10,12-hexanitro-2,4,6,8,10,12-hexaazaisowurtzitane, also known as HNIW) is one of the most powerful energetic materials. However, its high sensitivity to environmental stimuli greatly reduces its safety and severely limits its application. In this work, ab initio based neural network potential (NNP) energy surfaces for both β-CL-20 and CL-20/TNT co-crystals were constructed. To accurately simulate the thermal decomposition processes of these two crystal systems, reactive molecular dynamics simulations based on the NNPs were performed. Many important intermediate species and their associated reaction paths during the decomposition had been identified in the simulations and the direct results on detonation temperatures of both systems were provided. The simulations also showed clearly that 2,4,6-trinitrotoluene (TNT) molecules in the co-crystal act as a buffer to slow down the chain reactions triggered by nitrogen dioxide and this effect is more significant at lower temperatures. Specifically, the addition of TNT molecules in the CL-20/TNT co-crystal introduces intermolecular hydrogen bonds between CL-20 and TNT molecules in the system, thereby increasing the thermal stability of the co-crystal. The current reactive molecular dynamics simulation is performed based on the NNP which helps in accelerating the speed of ab initio molecular dynamics (AIMD) simulation by more than 3 orders of magnitude while preserving the accuracy of density functional theory (DFT) calculations. This enabled us to perform longer-time simulations at more realistic temperatures that traditional AIMD methods cannot achieve. With the advantage of the NNP in its powerful fitting ability and transferability, the NNP-based MD simulation can be widely applied to energetic material systems.

Machine learning (ML) continues to revolutionize computational chemistry for accelerating predictions and simulations by training on experimental or accurate but expensive quantum mechanical (QM) calculations. Photodynamics simulations require hundreds of trajectories coupled with multiconfigurational QM calculations of excited-state potential energies surfaces that contribute to the prohibitive computational cost at long timescales and complex organic molecules. ML accelerates photodynamics simulations by combining nonadiabatic photodynamics simulations with an ML model trained with high-fidelity QM calculations of energies, forces, and non-adiabatic couplings. This approach has provided time-dependent molecular structural information for understanding photochemical reaction mechanisms of organic reactions in vacuum and complex environments (i.e., explicit solvation). This review focuses on the fundamentals of QM calculations and ML techniques. We, then, discuss the strategies to balance adequate training data and the computational cost of generating these training data. Finally, we demonstrate the power of applying these ML-photodynamics simulations to understand the origin of reactivities and selectivities of organic photochemical reactions, such as cis–trans isomerization, [2 + 2]-cycloaddition, 4π-electrostatic ring-closing, and hydrogen roaming mechanism.

As an ensemble average result, vibrational spectrum simulation can be time-consuming with high accuracy methods. We present a machine learning approach based on the range-corrected deep potential (DPRc) model to improve the computing efficiency. The DPRc method divides the system into "probe region" and "solvent region"; "solvent-solvent" interactions are not counted in the neural network. We applied the approach to two systems: formic acid C═O stretching and MeCN C≡N stretching vibrational frequency shifts in water. All data sets were prepared using the quantum vibration perturbation approach. Effects of different region divisions, one-body correction, cut range, and training data size were tested. The model with a single-molecule "probe region" showed stable accuracy; it ran roughly 10 times faster than regular deep potential and reduced the training time by about four. The approach is efficient, easy to apply, and extendable to calculating various spectra.

Free energy differences (ΔF) are essential to quantitative characterization and understanding of chemical and biological processes. Their direct estimation with an accurate quantum mechanical potential is of great interest and yet impractical due to high computational cost and incompatibility with typical alchemical free energy protocols. One promising solution is the multilevel free energy simulation in which the estimate of ΔF at an inexpensive low level of theory is combined with the correction toward a higher level of theory. The poor configurational overlap generally expected between the two levels of theory, however, presents a major challenge. We overcome this challenge by using a deep neural network model and enhanced sampling simulations. An adversarial autoencoder is used to identify a low-dimensional (latent) space that compactly represents the degrees of freedom that encode the distinct distributions at the two levels of theory. Enhanced sampling in this latent space is then used to drive the sampling of configurations that predominantly contribute to the free energy correction. Results for both gas phase and condensed phase systems demonstrate that this data-driven approach offers high accuracy and efficiency with great potential for scalability to complex systems.

In silico investigations of enzymatic reactions and chemical reactions in condensed phases often suffer from formidable computational costs due to a large number of degrees of freedom and enormous important volume in phase space. Usually, accuracy must be compromised to trade for efficiency by lowering the reliability of the Hamiltonians employed or reducing the sampling time. Reference-potential methods (RPMs) offer an alternative approach to reaching high accuracy of simulation without much loss of efficiency. In this Perspective, we summarize the idea of RPMs and showcase some recent applications. Most importantly, the pitfalls of these methods are also discussed, and remedies to these pitfalls are presented.

We present a comparative study that evaluates the performance of a machine learning potential (ANI-2x), a conventional force field (GAFF), and an optimally tuned GAFF-like force field in the modeling of a set of 10 γ-fluorohydrins that exhibit a complex interplay between intra- and intermolecular interactions in determining conformer stability. To benchmark the performance of each molecular model, we evaluated their energetic, geometric, and sampling accuracies relative to quantum-mechanical data. This benchmark involved conformational analysis both in the gas phase and chloroform solution. We also assessed the performance of the aforementioned molecular models in estimating nuclear spin-spin coupling constants by comparing their predictions to experimental data available in chloroform. The results and discussion presented in this study demonstrate that ANI-2x tends to predict stronger-than-expected hydrogen bonding and overstabilize global minima and shows problems related to inadequate description of dispersion interactions. Furthermore, while ANI-2x is a viable model for modeling in the gas phase, conventional force fields still play an important role, especially for condensed-phase simulations. Overall, this study highlights the strengths and weaknesses of each model, providing guidelines for the use and future development of force fields and machine learning potentials.

Multiscale modeling has a long history of use in structural biology, as computational biologists strive to overcome the time- and length-scale limits of atomistic molecular dynamics. Contemporary machine learning techniques, such as deep learning, have promoted advances in virtually every field of science and engineering and are revitalizing the traditional notions of multiscale modeling. Deep learning has found success in various approaches for distilling information from fine-scale models, such as building surrogate models and guiding the development of coarse-grained potentials. However, perhaps its most powerful use in multiscale modeling is in defining latent spaces that enable efficient exploration of conformational space. This confluence of machine learning and multiscale simulation with modern high-performance computing promises a new era of discovery and innovation in structural biology.

Advances in machine learned interatomic potentials (MLIPs), such as those using neural networks, have resulted in short-range models that can infer interaction energies with near ab initio accuracy and orders of magnitude reduced computational cost. For many atom systems, including macromolecules, biomolecules, and condensed matter, model accuracy can become reliant on the description of short- and long-range physical interactions. The latter terms can be difficult to incorporate into an MLIP framework. Recent research has produced numerous models with considerations for nonlocal electrostatic and dispersion interactions, leading to a large range of applications that can be addressed using MLIPs. In light of this, we present a Perspective focused on key methodologies and models being used where the presence of nonlocal physics and chemistry are crucial for describing system properties. The strategies covered include MLIPs augmented with dispersion corrections, electrostatics calculated with charges predicted from atomic environment descriptors, the use of self-consistency and message passing iterations to propagated nonlocal system information, and charges obtained via equilibration schemes. We aim to provide a pointed discussion to support the development of machine learning-based interatomic potentials for systems where contributions from only nearsighted terms are deficient.

Quantum mechanics/molecular mechanics (QM/MM) hybrid models allow one to address chemical phenomena in complex molecular environments. Whereas this modeling approach can cope with a large system size at moderate computational costs, the models are often tedious to construct and require manual preprocessing and expertise. As a result, transferability to new application areas can be limited and the many parameters are not easy to adjust to reference data that are typically scarce. Therefore, it is desirable to devise automated procedures of controllable accuracy, which enables such modeling in a standardized and black‐box‐type manner. Although diverse best‐practice protocols have been set up for the construction of individual components of a QM/MM model (e.g., the MM potential, the type of embedding, the choice of the QM region), automated procedures that reconcile all steps of the QM/MM model construction are still rare. Here, we review the state of the art of QM/MM modeling with a focus on automation. We elaborate on MM model parametrization, on atom‐economical physically‐motivated QM region selection, and on embedding schemes that incorporate mutual polarization as critical components of the QM/MM model. In view of the broad scope of the field, we mostly restrict the discussion to methodologies that build de novo models based on first‐principles data, on uncertainty quantification, and on error mitigation with a high potential for automation. Ultimately, it is desirable to be able to set up reliable QM/MM models in a fast and efficient automated way without being constrained by specific chemical or technical limitations. This article is categorized under: Electronic Structure Theory > Combined QM/MM Methods This review discusses modern approaches towards generally applicable QM/MM models built from first‐principles data with a focus on automation and uncertainty quantification.

A powerful tool to study the mechanism of reactions in solutions or enzymes is to perform the ab initio quantum mechanical/molecular mechanical (QM/MM) molecular dynamics (MD) simulations. However, the computational cost is too high due to the explicit electronic structure calculations at every time step of the simulation. A neural network (NN) method can accelerate the QM/MM-MD simulations, but it has long been a problem to accurately describe the QM/MM electrostatic coupling by NN in the electrostatic embedding (EE) scheme. In this work, we developed a new method to accelerate QM/MM calculations in the mechanic embedding (ME) scheme. The potentials and partial point charges of QM atoms are first learned in vacuo by the embedded atom neural networks (EANN) approach. MD simulations are then performed on this EANN/MM potential energy surface (PES) to obtain free energy (FE) profiles for reactions, in which the QM/MM electrostatic coupling is treated in the mechanic embedding (ME) scheme. Finally, a weighted thermodynamic perturbation (wTP) corrects the FE profiles in the ME scheme to the EE scheme. For two reactions in water and one in methanol, our simulations reproduced the B3LYP/MM free energy profiles within 0.5 kcal/mol with a speed-up of 30-60-fold. The results show that the strategy of combining EANN potential in the ME scheme with the wTP correction is efficient and reliable for chemical reaction simulations in liquid. Another advantage of our method is that the QM PES is independent of the MM subsystem, so it can be applied to various MM environments as demonstrated by an SN2 reaction studied in water and methanol individually, which used the same EANN PES. The free energy profiles are in excellent accordance with the results obtained from B3LYP/MM-MD simulations. In future, this method will be applied to the reactions of enzymes and their variants.

Recent advances in data science are impacting the development of classical force fields. Here we review some ideas and techniques from data science that have been used in force field development, including database construction, atom typing, and machine learning potentials. We highlight how new tools such as active learning and automatic differentiation are facilitating the generation of target data and the direct fitting with macroscopic observables. Philosophical changes on how force field models should be built and used are also discussed. It's inspiring that more accurate biomolecular force fields can be developed with the aid of data science techniques.

In combined quantum mechanical and molecular mechanical (QM/MM) free energy simulations, how to synthesize the accuracy of ab initio (AI) methods with the speed of semiempirical (SE) methods for a cost-effective QM treatment remains a long-standing challenge. In this work, we present a machine-learning-facilitated method for obtaining AI/MM-quality free energy profiles through efficient SE/MM simulations. In particular, we use Gaussian process regression (GPR) to learn the energy and force corrections needed for SE/MM to match with AI/MM results during molecular dynamics simulations. Force matching is enabled in our model by including energy derivatives into the observational targets through the extended-kernel formalism. We demonstrate the effectiveness of this method on the solution-phase SN2 Menshutkin reaction using AM1/MM and B3LYP/6-31+G(d,p)/MM as the base and target levels, respectively. Trained on only 80 configurations sampled along the minimum free energy path (MFEP), the resulting GPR model reduces the average energy error in AM1/MM from 18.2 to 5.8 kcal mol-1 for the 4000-sample testing set with the average force error on the QM atoms decreased from 14.6 to 3.7 kcal mol-1 Å-1. Free energy sampling with the GPR corrections applied (AM1-GPR/MM) produces a free energy barrier of 14.4 kcal mol-1 and a reaction free energy of -34.1 kcal mol-1, in closer agreement with the AI/MM benchmarks and experimental results.

To accurately study the chemical reactions in the condensed phase or within enzymes, both quantum-mechanical description and sufficient configurational sampling are required to reach converged estimates. Here, quantum mechanics/molecular mechanics (QM/MM) molecular dynamics (MD) simulations play an important role, providing QM accuracy for the region of interest at a decreased computational cost. However, QM/MM simulations are still too expensive to study large systems on longer time scales. Recently, machine learning (ML) models have been proposed to replace the QM description. The main limitation of these models lies in the accurate description of long-range interactions present in condensed-phase systems. To overcome this issue, a recent workflow has been introduced combining a semi-empirical method (i.e. density functional tight binding (DFTB)) and a high-dimensional neural network potential (HDNNP) in a Δ-learning scheme. This approach has been shown to be capable of correctly incorporating long-range interactions within a cutoff of 1.4 nm. One of the promising alternative approaches to efficiently take long-range effects into account is the development of graph-convolutional neural networks (GCNNs) for the prediction of the potential-energy surface. In this work, we investigate the use of GCNN models - with and without a Δ-learning scheme - for (QM)ML/MM MD simulations. We show that the Δ-learning approach using a GCNN and DFTB as a baseline achieves competitive performance on our benchmarking set of solutes and chemical reactions in water. This method is additionally validated by performing prospective (QM)ML/MM MD simulations of retinoic acid in water and S-adenoslymethionine interacting with cytosine in water. The results indicate that the Δ-learning GCNN model is a valuable alternative for the (QM)ML/MM MD simulations of condensed-phase systems.

New enzyme functions exist within the increasing number of unannotated protein sequences. Novel enzyme discovery is necessary to expand the pathways that can be accessed by metabolic engineering for the biosynthesis of functional compounds. Accordingly, various machine learning models have been developed to predict enzymatic reactions. However, the ability to predict unknown reactions that are not included in the training data has not been clarified. In order to cover uncertain and unknown reactions, a wider range of reaction types must be demonstrated by the models. Here, we establish 16 expanded enzymatic reaction prediction models developed using various machine learning algorithms, including deep neural network. Improvements in prediction performances over that of our previous study indicate that the updated methods are more effective for the prediction of enzymatic reactions. Overall, the deep neural network model trained with combined substrate-enzyme-product information exhibits the highest prediction accuracy with Macro F1 scores up to 0.966 and with robust prediction of unknown enzymatic reactions that are not included in the training data. This model can predict more extensive enzymatic reactions in comparison to previously reported models. This study will facilitate the discovery of new enzymes for the production of useful substances.

We have developed a combined fragment-based machine learning (ML) force field and molecular mechanics (MM) force field for simulating the structures of macromolecules in solutions, and then compute its NMR chemical shifts with the generalized energy-based fragmentation (GEBF) approach at the level of density functional theory (DFT). In this work, we first construct Gaussian approximation potential based on GEBF subsystems of macromolecules for MD simulations and then a GEBF-based neural network (GEBF-NN) with deep potential model for the studied macromolecule. Then, we develop a GEBF-NN/MM force field for macromolecules in solutions by combining the GEBF-NN force field for the solute molecule and ff14SB force field for solvent molecules. Using the GEBF-NN/MM MD simulation to generate snapshot structures of solute/solvent clusters, we then perform the NMR calculations with the GEBF approach at the DFT level to calculate NMR chemical shifts of the solute molecule. Taking a heptamer of oligopyridine-dicarboxamides in chloroform solution as an example, our results show that the GEBF-NN force field is quite accurate for this heptamer by comparing with the reference DFT results. For this heptamer in chloroform solution, both the GEBF-NN/MM and classical MD simulations could lead to helical structures from the same initial extended structure. The GEBF-DFT NMR results indicate that the GEBF-NN/MM force field could lead to more accurate NMR chemical shifts on hydrogen atoms by comparing with the experimental NMR results. Therefore, the GEBF-NN/MM force field could be employed for predicting more accurate dynamical behaviors than the classical force field for complex systems in solutions.

Semiempirical methods like density functional tight-binding (DFTB) allow extensive phase space sampling, making it possible to generate free energy surfaces of complex reactions in condensed-phase environments. Such a high efficiency often comes at the cost of reduced accuracy, which may be improved by developing a specific reaction parametrization (SRP) for the particular molecular system. Thiol-disulfide exchange is a nucleophilic substitution reaction that occurs in a large class of proteins. Its proper description requires a high-level ab initio method, while DFT-GAA and hybrid functionals were shown to be inadequate, and so is DFTB due to its DFT-GGA descent. We develop an SRP for thiol-disulfide exchange based on an artificial neural network (ANN) implementation in the DFTB+ software and compare its performance to that of a standard SRP approach applied to DFTB. As an application, we use both new DFTB-SRP as components of a QM/MM scheme to investigate thiol-disulfide exchange in two molecular complexes: a solvated model system and a blood protein. Demonstrating the strengths of the methodology, highly accurate free energy surfaces are generated at a low cost, as the augmentation of DFTB with an ANN only adds a small computational overhead.

Predictive molecular simulations require fast, accurate and reactive interatomic potentials. Machine learning offers a promising approach to construct such potentials by fitting energies and forces to high-level quantum-mechanical data, but doing so typically requires considerable human intervention and data volume. Here we show that, by leveraging hierarchical and active learning, accurate Gaussian Approximation Potential (GAP) models can be developed for diverse chemical systems in an autonomous manner, requiring only hundreds to a few thousand energy and gradient evaluations on a reference potential-energy surface. The approach uses separate intra- and inter-molecular fits and employs a prospective error metric to assess the accuracy of the potentials. We demonstrate applications to a range of molecular systems with relevance to computational organic chemistry: ranging from bulk solvents, a solvated metal ion and a metallocage onwards to chemical reactivity, including a bifurcating Diels-Alder reaction in the gas phase and non-equilibrium dynamics (a model SN2 reaction) in explicit solvent. The method provides a route to routinely generating machine-learned force fields for reactive molecular systems.

Quantum mechanics/molecular mechanics (QM/MM) molecular dynamics (MD) simulations have been developed to simulate molecular systems, where an explicit description of changes in the electronic structure is necessary. However, QM/MM MD simulations are computationally expensive compared to fully classical simulations as all valence electrons are treated explicitly and a self-consistent field (SCF) procedure is required. Recently, approaches have been proposed to replace the QM description with machine-learned (ML) models. However, condensed-phase systems pose a challenge for these approaches due to long-range interactions. Here, we establish a workflow, which incorporates the MM environment as an element type in a high-dimensional neural network potential (HDNNP). The fitted HDNNP describes the potential-energy surface of the QM particles with an electrostatic embedding scheme. Thus, the MM particles feel a force from the polarized QM particles. To achieve chemical accuracy, we find that even simple systems require models with a strong gradient regularization, a large number of data points, and a substantial number of parameters. To address this issue, we extend our approach to a Δ-learning scheme, where the ML model learns the difference between a reference method (density functional theory (DFT)) and a cheaper semiempirical method (density functional tight binding (DFTB)). We show that such a scheme reaches the accuracy of the DFT reference method while requiring significantly less parameters. Furthermore, the Δ-learning scheme is capable of correctly incorporating long-range interactions within a cutoff of 1.4 nm. It is validated by performing MD simulations of retinoic acid in water and the interaction between S-adenoslymethioniat and cytosine in water. The presented results indicate that Δ-learning is a promising approach for (QM)ML/MM MD simulations of condensed-phase systems.

Combustion is a complex chemical system which involves thousands of chemical reactions and generates hundreds of molecular species and radicals during the process. In this work, a neural network-based molecular dynamics (MD) simulation is carried out to simulate the benchmark combustion of methane. During MD simulation, detailed reaction processes leading to the creation of specific molecular species including various intermediate radicals and the products are intimately revealed and characterized. Overall, a total of 798 different chemical reactions were recorded and some new chemical reaction pathways were discovered. We believe that the present work heralds the dawn of a new era in which neural network-based reactive MD simulation can be practically applied to simulating important complex reaction systems at ab initio level, which provides atomic-level understanding of chemical reaction processes as well as discovery of new reaction pathways at an unprecedented level of detail beyond what laboratory experiments could accomplish.

DFTB+ is a versatile community developed open source software package offering fast and efficient methods for carrying out atomistic quantum mechanical simulations. By implementing various methods approximating density functional theory (DFT), such as the density functional based tight binding (DFTB) and the extended tight binding method, it enables simulations of large systems and long timescales with reasonable accuracy while being considerably faster for typical simulations than the respective ab initio methods. Based on the DFTB framework, it additionally offers approximated versions of various DFT extensions including hybrid functionals, time dependent formalism for treating excited systems, electron transport using non-equilibrium Green’s functions, and many more. DFTB+ can be used as a user-friendly standalone application in addition to being embedded into other software packages as a library or acting as a calculation-server accessed by socket communication. We give an overview of the recently developed capabilities of the DFTB+ code, demonstrating with a few use case examples, discuss the strengths and weaknesses of the various features, and also discuss on-going developments and possible future perspectives.

TiO2 is a widely used photocatalyst in science and technology and its interface with water is important in fields ranging from geochemistry to biomedicine. Yet, it is still unclear whether wateradsorbs in molecular or dissociated form on TiO2 even for the case of well-defined crystalline surfaces. To address this issue, we simulated the TiO2 -water interface using molecular dynamics with an ab initio-based deep neural network potential. Our simulations show a dynamical equilibrium of molecular and dissociative adsorption of water on TiO2 . Water dissociates through a solvent-assisted concerted proton transfer to form a pair of short-lived hydroxyl groups on the TiO2 surface. Molecular adsorption of water is ∆F = 7.5 ± 0.9 kJ/mol lower in free energy than the dissociative adsorption, giving rise to a 6 ± 0.5% equilibrium water dissociation fraction at room temperature. Due to the relevance of surface hydroxyl groups to the surface chemistry of TiO2, our model might be key to understanding phenomena ranging from surface functionalization to photocatalytic mechanisms.

The Varkud satellite ribozyme catalyses site-specific RNA cleavage and ligation, and serves as an important model system to understand RNA catalysis. Here, we combine stereospecific phosphorothioate substitution, precision nucleobase mutation and linear free-energy relationship measurements with molecular dynamics, molecular solvation theory and ab initio quantum mechanical/molecular mechanical free-energy simulations to gain insight into the catalysis. Through this confluence of theory and experiment, we unify the existing body of structural and functional data to unveil the catalytic mechanism in unprecedented detail, including the degree of proton transfer in the transition state. Further, we provide evidence for a critical Mg2+ in the active site that interacts with the scissile phosphate and anchors the general base guanine in position for nucleophile activation. This novel role for Mg2+ adds to the diversity of known catalytic RNA strategies and unifies functional features observed in the Varkud satellite, hairpin and hammerhead ribozyme classes.

Atomic neural networks (ANNs) constitute a class of machine learning methods for predicting potential energy surfaces and physico-chemical properties of molecules and materials. Despite many successes, developing interpretable ANN architectures and implementing existing ones efficiently are still challenging. This calls for reliable, general-purpose and open-source codes. Here, we present a python library named PiNN as a solution toward this goal. In PiNN, we designed a new interpretable and high-performing graph convolutional neural network variant, PiNet, as well as implemented the established Behler-Parrinello high-dimensional neural network. These implementations were tested using datasets of isolated small molecules, crystalline materials, liquid water and an aqueous alkaline electrolyte. PiNN comes with a visualizer called PiNNBoard to extract chemical insight ``learned'' by ANNs, provides analytical stress tensor calculations and interfaces to both the Atomic Simulation Environment and a development version of the Amsterdam Modeling Suite. Moreover, PiNN is highly modularized which makes it useful not only as a standalone package but also as a chain of tools to develop and to implement novel ANNs. The code is distributed under a permissive BSD license and is freely accessible at \href{https://github.com/Teoroo-CMC/PiNN/}{https://github.com/Teoroo-CMC/PiNN/} with full documentation and tutorials.

We develop an L-platform/L-scaffold framework we hypothesize may serve as a blueprint to facilitate site-specific RNA-cleaving nucleic acid enzyme design. Building on the L-platform motif originally described by Suslov and coworkers, we identify new critical scaffolding elements required to anchor a conserved general base guanine ("L-anchor") and bind functionally important metal ions at the active site ("L-pocket"). Molecular simulations, together with a broad range of experimental structural and functional data, connect the L-platform/L-scaffold elements to necessary and sufficient conditions for catalytic activity. We demonstrate that the L-platform/L-scaffold framework is common to 5 of the 9 currently known naturally occurring ribozyme classes (Twr, HPr, VSr, HHr, Psr), and intriguingly from a design perspective, the framework also appears in an artificially engineered DNAzyme (8-17dz). The flexibility of the L-platform/L-scaffold framework is illustrated on these systems, highlighting modularity and trends in the variety of known general acid moieties that are supported. These trends give rise to two distinct catalytic paradigms, building on the classifications proposed by Wilson and coworkers and named for the implicated general base and acid. The "G+A" paradigm (Twr, HPr, VSr) exclusively utilizes nucleobase residues for chemistry, and the "G+M+" paradigm (HHr, 8-17dz, Psr) involves structuring of the "L-pocket" metal ion binding site for recruitment of a divalent metal ion that plays an active role in the chemical steps of the reaction. Finally, the modularity of the L-platform/L-scaffold framework is illustrated in the VS ribozyme where the "L-pocket" assumes the functional role of the "L-anchor" element, highlighting a distinct mechanism, but one that is functionally linked with the hammerhead ribozyme.

We perform molecular dynamics simulations, based on recent crystallographic data, on the 8-17 DNAzyme at four states along the reaction pathway to determine the dynamical ensemble for the active state and transition state mimic in solution. A striking finding is the diverse roles played by Na+ and Pb2+ ions in the electrostatically strained active site that impact all four fundamental catalytic strategies, and share commonality with some features recently inferred for naturally occurring hammerhead and pistol ribozymes. The active site Pb2+ ion helps to stabilize in-line nucleophilic attack, provides direct electrostatic transition state stabilization, and facilitates leaving group departure. A conserved guanine residue is positioned to act as the general base, and is assisted by a bridging Na+ ion that tunes the pKa and facilitates in-line fitness. The present work provides insight into how DNA molecules are able to solve the RNA-cleavage problem, and establishes functional relationships between the mechanism of these engineered DNA enzymes with their naturally evolved RNA counterparts. This adds valuable information to our growing body of knowledge on general mechanisms of phosphoryl transfer reactions catalyzed by RNA, proteins and DNA.

The pistol ribozyme (Psr) is among the most recently discovered RNA enzymes and has been the subject of experiments aimed at elucidating the mechanism. Recent biochemical studies have revealed exciting clues about catalytic interactions in the active site not apparent from available crystallographic data. The present work unifies the interpretation of the existing body of structural and functional data on Psr by providing a dynamical model for the catalytically active state in solution from molecular simulation. Our results suggest that a catalytic Mg2+ ion makes inner-sphere contact with G33:N7 and outer-sphere coordination to the pro-RP of the scissile phosphate, promoting electrostatic stabilization of the dianionic transition state and neutralization of the developing charge of the leaving group through a metal-coordinated water molecule that is made more acidic by a hydrogen bond donated from the 2'OH of P32. This model is consistent with experimental activity-pH and mutagenesis data, including sensitivity to G33(7cG) and phosphorothioate substitution/metal ion rescue. The model suggests several experimentally testable predictions, including the response of cleavage activity to mutations at G42 and P32 positions in the ribozyme, and thio substitutions of the substrate in the presence of different divalent metal ions. Further, the model identifies striking similarities of Psr to the hammerhead ribozyme (HHr), including similar global fold, organization of secondary structure around an active site three-way junction, catalytic metal ion binding mode, and guanine general base. However, the specific binding mode and role of the Mg2+ ion, as well as a conserved 2'-OH in the active site, are interrelated but subtly different between the ribozymes.

Computational modeling of chemical and biological systems at atomic resolution is a crucial tool in the chemist's toolset. The use of computer simulations requires a balance between cost and accuracy: quantum-mechanical methods provide high accuracy but are computationally expensive and scale poorly to large systems, while classical force fields are cheap and scalable, but lack transferability to new systems. Machine learning can be used to achieve the best of both approaches. Here we train a general-purpose neural network potential (ANI-1ccx) that approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. This is achieved by training a network to DFT data then using transfer learning techniques to retrain on a dataset of gold standard QM calculations (CCSD(T)/CBS) that optimally spans chemical space. The resulting potential is broadly applicable to materials science, biology, and chemistry, and billions of times faster than CCSD(T)/CBS calculations.

In recent years, machine learning (ML) methods have become increasingly popular in computational chemistry. After being trained on appropriate ab initio reference data, these methods allow to accurately predict the properties of chemical systems, circumventing the need for explicitly solving the electronic Schrödinger equation. Because of their computational efficiency and scalability to large datasets, deep neural networks (DNNs) are a particularly promising ML algorithm for chemical applications. This work introduces PhysNet, a DNN architecture designed for predicting energies, forces and dipole moments of chemical systems. PhysNet achieves state-of-the-art performance on the QM9, MD17 and ISO17 benchmarks. Further, two new datasets are generated in order to probe the performance of ML models for describing chemical reactions, long-range interactions, and condensed phase systems. It is shown that explicitly including electrostatics in energy predictions is crucial for a qualitatively correct description of the asymptotic regions of a potential energy surface (PES). PhysNet models trained on a systematically constructed set of small peptide fragments (at most eight heavy atoms) are able to generalize to considerably larger proteins like deca-alanine (Ala10): The optimized geometry of helical Ala10 predicted by PhysNet is virtually identical to ab initio results (RMSD = 0.21 Å). By running unbiased molecular dynamics (MD) simulations of Ala10 on the PhysNet-PES in gas phase, it is found that instead of a helical structure, Ala10 folds into a "wreath-shaped" configuration, which is more stable than the helical form by 0.46 kcal mol⁻¹ according to the reference ab initio calculations.

An active learning procedure called deep potential generator (DP-GEN) is proposed for the construction of accurate and transferable machine learning-based models of the potential energy surface (PES) for the molecular modeling of materials. This procedure consists of three main components: exploration, generation of accurate reference data, and training. Application to the sample systems of Al, Mg, and Al-Mg alloys demonstrates that DP-GEN can produce uniformly accurate PES models with a minimal number of reference data.

In this Viewpoint, we discuss the current progress in applications of machine learning (ML) and artificial intelligence (AI) to meet the challenges of computational drug discovery. We identify several areas where existing methods have the potential to accelerate pharmaceutical research and disrupt more traditional approaches.

Deep learning has led to a paradigm shift in artificial intelligence, including web, text, and image search, speech recognition, as well as bioinformatics, with growing impact in chemical physics. Machine learning, in general, and deep learning, in particular, are ideally suitable for representing quantum-mechanical interactions, enabling us to model nonlinear potential-energy surfaces or enhancing the exploration of chemical compound space. Here we present the deep learning architecture SchNet that is specifically designed to model atomistic systems by making use of continuous-filter convolutional layers. We demonstrate the capabilities of SchNet by accurately predicting a range of properties across chemical space for molecules and materials, where our model learns chemically plausible embeddings of atom types across the periodic table. Finally, we employ SchNet to predict potential-energy surfaces and energy-conserving force fields for molecular dynamics simulations of small molecules and perform an exemplary study on the quantum-mechanical properties of C20-fullerene that would have been infeasible with regular ab initio molecular dynamics.

We introduce a representation of any atom in any chemical environment for the automatized generation of universal kernel ridge regression-based quantum machine learning (QML) models of electronic properties, trained throughout chemical compound space. The representation is based on Gaussian distribution functions, scaled by power laws and explicitly accounting for structural as well as elemental degrees of freedom. The elemental components help us to lower the QML model’s learning curve, and, through interpolation across the periodic table, even enable “alchemical extrapolation” to covalent bonding between elements not part of training. This point is demonstrated for the prediction of covalent binding in single, double, and triple bonds among main-group elements as well as for atomization energies in organic molecules. We present numerical evidence that resulting QML energy models, after training on a few thousand random training instances, reach chemical accuracy for out-of-sample compounds. Compound datasets studied include thousands of structurally and compositionally diverse organic molecules, non-covalently bonded protein side-chains, (H2O)40-clusters, and crystalline solids. Learning curves for QML models also indicate competitive predictive power for various other electronic ground state properties of organic molecules, calculated with hybrid density functional theory, including polarizability, heat-capacity, HOMO-LUMO eigenvalues and gap, zero point vibrational energy, dipole moment, and highest vibrational fundamental frequency.

We use HIP-NN, a neural network architecture that excels at predicting molecular energies, to predict atomic charges. The charge predictions are accurate over a wide range of molecules (both small and large) and for a diverse set of charge assignment schemes. To demonstrate the power of charge prediction on non-equilibrium geometries, we use HIP-NN to generate IR spectra from dynamical trajectories on a variety of molecules. The results are in good agreement with reference IR spectra produced by traditional theoretical methods. Critically, for this application, HIP-NN charge predictions are about 104 times faster than direct DFT charge calculations. Thus, ML provides a pathway to greatly increase the range of feasible simulations while retaining quantum-level accuracy. In summary, our results provide further evidence that machine learning can replicate high-level quantum calculations at a tiny fraction of the computational cost.

Recent developments in many-body potential energy representation via deep learning have brought new hopes to addressing the accuracy-versus-efficiency dilemma in molecular simulations. Here we describe DeePMD-kit, a package written in Python/C++ that has been designed to minimize the effort required to build deep learning based representation of potential energy and force field and to perform molecular dynamics. Potential applications of DeePMD-kit span from finite molecules to extended systems and from metallic systems to chemically bonded systems. DeePMD-kit is interfaced with TensorFlow, one of the most popular deep learning frameworks, making the training process highly automatic and efficient. On the other end, DeePMD-kit is interfaced with high-performance classical molecular dynamics and quantum (path-integral) molecular dynamics packages, i.e., LAMMPS and the i-PI, respectively. Thus, upon training, the potential energy and force field models can be used to perform efficient molecular simulations for different purposes. As an example of the many potential applications of the package, we use DeePMD-kit to learn the interatomic potential energy and forces of a water model using data obtained from density functional theory. We demonstrate that the resulted molecular dynamics model reproduces accurately the structural information contained in the original model.

This paper investigates the problem of adaptive fault-tolerant control for a class of nonlinear parametric strict-feedback systems with multiple unknown control directions. Multiple sensor faults are first considered such that all real state variables are unavailable. Then, a constructive design method for the problem is set up by exploiting a parameter separation and regrouping technique. To circumvent the main obstacle caused by the coupling effects of multiple unknown control directions and sensor faults, a region-dependent segmentation analysis method is proposed. It is proven that the closed-loop system is globally exponentially stable. Simulation results are presented to illustrate the effectiveness of the proposed scheme.

Molecular dynamics (MD) simulations employing ab initio quantum mechanical and molecular mechanical (ai-QM/MM) potentials are considered to be the state of the art, but the high computational cost associated with the ai-QM calculations remains a theoretical challenge for their routine application. Here, we present a modified protocol of the multiple time step (MTS) method for accelerating ai-QM/MM MD simulations of condensed-phase reactions. Within a previous MTS protocol [Nam J. Chem. Theory Comput. 2014, 10, 4175], reference forces are evaluated using a low-level (semiempirical QM/MM) Hamiltonian and employed at inner time steps to propagate the nuclear motions. Correction forces, which arise from the force differences between high-level (ai-QM/MM) and low-level Hamiltonians, are applied at outer time steps, where the MTS algorithm allows the time-reversible integration of the correction forces. To increase the outer step size, which is bound by the highest-frequency component in the correction forces, the semiempirical QM Hamiltonian is recalibrated in this work to minimize the magnitude of the correction forces. The remaining high-frequency modes, which are mainly bond stretches involving hydrogen atoms, are then removed from the correction forces. When combined with a Langevin or SIN(R) thermostat, the modified MTS-QM/MM scheme remains robust with an up to 8 (with Langevin) or 10 fs (with SIN(R)) outer time step (with 1 fs inner time steps) for the chorismate mutase system. This leads to an over 5-fold speedup over standard ai-QM/MM simulations, without sacrificing the accuracy in the predicted free energy profile of the reaction.

We redevelop the variational free energy profile (vFEP) method using a cardinal B-spline basis to extend the method for analyzing free energy surfaces (FESs) involving three or more reaction coordinates. We also implemented software for evaluating high-dimensional profiles based on the multistate Bennett acceptance ratio (MBAR) method which constructs an unbiased probability density from global reweighting of the observed samples. The MBAR method takes advantage of a fast algorithm for solving the unbinned weighted histogram (UWHAM)/MBAR equations which replaces the solution of simultaneous equations with a nonlinear optimization of a convex function. We make use of cardinal B-splines and multiquadric radial basis functions to obtain smooth, differentiable MBAR profiles in arbitrary high dimensions. The cardinal B-spline vFEP and MBAR methods are compared using three example systems that examine 1D, 2D, and 3D profiles. Both methods are found to be useful and produce nearly indistinguishable results. The vFEP method is found to be 150 times faster than MBAR when applied to periodic 2D profiles, but the MBAR method is 4.5 times faster than vFEP when evaluating unbounded 3D profiles. In agreement with previous comparisons, we find the vFEP method produces superior FESs when the overlap between umbrella window simulations decreases. Finally, the associative reaction mechanism of hammerhead ribozyme is characterized using 3D, 4D, and 6D profiles, and the higher-dimensional profiles are found to have smaller reaction barriers by as much as 1.5 kcal/mol. The methods presented here have been implemented into the FE-ToolKit software package along with new methods for network-wide free energy analysis in drug discovery.

On the surface
The uptake and hydrolysis of N 2 O 5 from the atmosphere by aqueous aerosols was long thought to occur by solvation and subsequent hydrolysis in the bulk of the aerosol. However, this mechanistic hypothesis was unverifiable because of the fast reaction kinetics. Galib et al. used molecular simulations to show instead that the mechanism is the inverse: Interfacial hydrolysis is followed by solvation into the interior. Their reactive uptake model is consistent with some existing experimental observations.
Science , this issue p. 921

We explore the role of long-range interactions in atomistic machine-learning models by analyzing the effects on fitting accuracy, isolated cluster properties, and bulk thermodynamic properties. Such models have become increasingly popular in molecular simulations given their ability to learn highly complex and multi-dimensional interactions within a local environment; however, many of them fundamentally lack a description of explicit long-range interactions. In order to provide a well-defined benchmark system with precisely known pairwise interactions, we chose as the reference model a flexible version of the Extended Simple Point Charge (SPC/E) water model. Our analysis shows that while local representations are sufficient for predictions of the condensed liquid phase, the short-range nature of machine-learning models falls short in representing cluster and vapor phase properties. These findings provide an improved understanding of the role of long-range interactions in machine learning models and the regimes where they are necessary.

In a previous work [Pan et al., Molecules 23, 2500 (2018)], a charge projection scheme was reported, where outer molecular mechanical (MM) charges [>10 Å from the quantum mechanical (QM) region] were projected onto the electrostatic potential (ESP) grid of the QM region to accurately and efficiently capture long-range electrostatics in ab initio QM/MM calculations. Here, a further simplification to the model is proposed, where the outer MM charges are projected onto inner MM atom positions (instead of ESP grid positions). This enables a representation of the long-range MM electrostatic potential via augmentary charges (AC) on inner MM atoms. Combined with the long-range electrostatic correction function from Cisneros et al. [J. Chem. Phys. 143, 044103 (2015)] to smoothly switch between inner and outer MM regions, this new QM/MM-AC electrostatic model yields accurate and continuous ab initio QM/MM electrostatic energies with a 10 Å cutoff between inner and outer MM regions. This model enables efficient QM/MM cluster calculations with a large number of MM atoms as well as QM/MM calculations with periodic boundary conditions.

QM/MM simulations have become an indispensable tool in many chemical and biochemical investigations. Considering the tremendous degree of success, including recognition by a 2013 Nobel Prize in Chemistry, are there still "burning challenges" in QM/MM methods, especially for biomolecular systems? In this short Perspective, we discuss several issues that we believe greatly impact the robustness and quantitative applicability of QM/MM simulations to many, if not all, biomolecules. We highlight these issues with observations and relevant advances from recent studies in our group and others in the field. Despite such limited scope, we hope the discussions are of general interest and will stimulate additional developments that help push the field forward in meaningful directions.

We present the GPU version of DeePMD-kit, which, upon training a deep neural network model using ab initio data, can drive extremely large-scale molecular dynamics (MD) simulation with ab initio accuracy. Our tests show that for a water system of 12,582,912 atoms, the GPU version can be 7 times faster than the CPU version under the same power consumption. The code can scale up to the entire Summit supercomputer. For a copper system of 113,246,208 atoms, the code can perform one nanosecond MD simulation per day, reaching a peak performance of 86 PFLOPS (43% of the peak). Such unprecedented ability to perform MD simulation with ab initio accuracy opens up the possibility of studying many important issues in materials and molecules, such as heterogeneous catalysis, electrochemical cells, irradiation damage, crack propagation, and biochemical reactions.
Program summary
Program Title: DeePMD-kit
CPC Library link to program files: https://doi.org/10.17632/phyn4kgsfx.1
Developer’s repository link: https://doi.org/10.5281/zenodo.3961106
Licensing provisions: LGPL
Programming language: C++/Python/CUDA
Journal reference of previous version: Comput. Phys. Commun. 228 (2018), 178–184.
Does the new version supersede the previous version?: Yes.
Reasons for the new version: Parallelize and optimize the DeePMD-kit for modern high performance computers.
Summary of revisions: The optimized DeePMD-kit is capable of computing 100 million atoms molecular dynamics with ab initio accuracy, achieving 86 PFLOPS in double precision.
Nature of problem: Modeling the many-body atomic interactions by deep neural network models. Running molecular dynamics simulations with the models.
Solution method: The Deep Potential for Molecular Dynamics (DeePMD) method is implemented based on the deep learning framework TensorFlow. Standard and customized TensorFlow operators are optimized for GPU. Massively parallel molecular dynamics simulations with DeePMD models on high performance computers are supported in the new version.

We propose a general machine learning-based framework for building an accurate and widely applicable energy functional within the framework of generalized Kohn-Sham density functional theory. To this end, we develop a way of training self-consistent models that are capable of taking large datasets from different systems and different kinds of labels. We demonstrate that the functional that results from this training procedure gives chemically accurate predictions on energy, force, dipole, and electron density for a large class of molecules. It can be continuously improved when more and more data are available.

Reactive molecular dynamics (MD) simulation is a powerful tool to study the reaction mechanism of complex chemical systems. Central to the method is the potential energy surface (PES) that can describe the breaking and formation of chemical bonds. The development of both accurate and efficient PES has attracted significant effort in the past 2 decades. A recently developed deep potential (DP) model has the promise to bring ab initio accuracy to large-scale reactive MD simulations. However, for complex chemical reaction processes like pyrolysis, it remains challenging to generate reliable DP models with an optimal training data set. In this work, a data set construction scheme for such a purpose was established. The employment of a concurrent learning algorithm allows us to maximize the exploration of the chemical space while minimizing the redundancy of the data set. This greatly reduces the cost of computational resources required for ab initio calculations. Based on this method, we constructed a data set for the pyrolysis of n-dodecane, which contains 35 496 structures. The reactive MD simulation with the DP model trained based on this data set revealed the pyrolysis mechanism of n-dodecane in detail, and the simulation results are in good agreement with the experimental measurements. In addition, this data set shows excellent transferability to different long-chain alkanes. These results demonstrate the advantages of the proposed method for constructing training data sets for similar systems.

Machine learned reactive force fields based on polynomial expansions have been shown to be highly effective for describing simulations involving reactive materials. Nevertheless, the highly flexible nature of these models can give rise to a large number of candidate parameters for complicated systems. In these cases, reliable parameterization requires a well-formed training set, which can be difficult to achieve through standard iterative fitting methods. Here, we present an active learning approach based on cluster analysis and inspired by Shannon information theory to enable semi-automated generation of informative training sets and robust machine learned force fields. The use of this tool is demonstrated for development of a model based on linear combinations of Chebyshev polynomials explicitly describing up to four-body interactions, for a chemically and structurally diverse system of C/O under extreme conditions. We show that this flexible training database management approach enables development of models exhibiting excellent agreement with Kohn-Sham density functional theory in terms of structure, dynamics, and speciation.

Predicting protein-ligand binding affinities and the associated thermodynamics of biomolecular recognition is a primary objective of structure-based drug design. Alchemical free energy simulations offer a highly accurate and computationally efficient route to achieving this goal. While the AMBER molecular dynamics package has successfully been used for alchemical free energy simulations in academic research groups for decades, widespread impact in industrial drug discovery settings has been minimal because of the previous limitations within the AMBER alchemical code, coupled with challenges in system setup and postprocessing workflows. Through a close academia-industry collaboration we have addressed many of the previous limitations with an aim to improve accuracy, efficiency, and robustness of alchemical binding free energy simulations in industrial drug discovery applications. Here, we highlight some of the recent advances in AMBER20 with a focus on alchemical binding free energy (BFE) calculations, which are less computationally intensive than alternative binding free energy methods where full binding/unbinding paths are explored. In addition to scientific and technical advances in AMBER20, we also describe the essential practical aspects associated with running relative alchemical BFE calculations, along with recommendations for best practices, highlighting the importance not only of the alchemical simulation code but also the auxiliary functionalities and expertise required to obtain accurate and reliable results. This work is intended to provide a contemporary overview of the scientific, technical, and practical issues associated with running relative BFE simulations in AMBER20, with a focus on real-world drug discovery applications.

The emergence of machine learning methods in quantum chemistry provides new methods to revisit an old problem: Can the predictive accuracy of electronic structure calculations be decoupled from their numerical bottlenecks? Previous attempts to answer this question have, among other methods, given rise to semi-empirical quantum chemistry in minimal basis representation. We present an adaptation of the recently proposed SchNet for Orbitals (SchNOrb) deep convolutional neural network model [K. T. Schütt et al., Nat. Commun. 10, 5024 (2019)] for electronic wave functions in an optimized quasi-atomic minimal basis representation. For five organic molecules ranging from 5 to 13 heavy atoms, the model accurately predicts molecular orbital energies and wave functions and provides access to derived properties for chemical bonding analysis. Particularly for larger molecules, the model outperforms the original atomic-orbital-based SchNOrb method in terms of accuracy and scaling. We conclude by discussing the future potential of this approach in quantum chemical workflows.

Intermolecular interactions are critical to many chemical phenomena, but their accurate computation using ab initio methods is often limited by computational cost. The recent emergence of machine learning (ML) potentials may be a promising alternative. Useful ML models should not only estimate accurate interaction energies but also predict smooth and asymptotically correct potential energy surfaces. However, existing ML models are not guaranteed to obey these constraints. Indeed, systemic deficiencies are apparent in the predictions of our previous hydrogen-bond model as well as the popular ANI-1X model, which we attribute to the use of an atomic energy partition. As a solution, we propose an alternative atomic-pairwise framework specifically for intermolecular ML potentials, and we introduce AP-Net—a neural network model for interaction energies. The AP-Net model is developed using this physically motivated atomic-pairwise paradigm and also exploits the interpretability of symmetry adapted perturbation theory (SAPT). We show that in contrast to other models, AP-Net produces smooth, physically meaningful intermolecular potentials exhibiting correct asymptotic behavior. Initially trained on only a limited number of mostly hydrogen-bonded dimers, AP-Net makes accurate predictions across the chemically diverse S66x8 dataset, demonstrating significant transferability. On a test set including experimental hydrogen-bonded dimers, AP-Net predicts total interaction energies with a mean absolute error of 0.37 kcal mol⁻¹, reducing errors by a factor of 2–5 across SAPT components from previous neural network potentials. The pairwise interaction energies of the model are physically interpretable, and an investigation of predicted electrostatic energies suggests that the model “learns” the physics of hydrogen-bonded interactions.

Combining multiple levels of theory in free energy simulations to balance computational accuracy and efficiency is a promising approach for studying processes in the condensed phase. While the basic idea has been proposed and explored for quite some time, it remains challenging to achieve convergence for such multi-level free energy simulations as it requires a favorable distribution overlap between different levels of theory. Previous efforts focused on improving the distribution overlap by either altering the low-level of theory for the specific system of interest or ignoring certain degrees of freedom. Here, we propose an alternative strategy that first identifies the degrees of freedom that lead to gaps in the distributions of different levels of theory and then treats them separately with either constraints or restraints or by introducing an intermediate model that better connects the low and high levels of theory. As a result, the conversion from the low level to the high level model is done in a staged fashion that ensures a favorable distribution overlap along the way. Free energy components associated with different steps are mostly evaluated explicitly, and thus, the final result can be meaningfully compared to the rigorous free energy difference between the two levels of theory with limited and well-defined approximations. The additional free energy component calculations involve simulations at the low level of theory and therefore do not incur high computational costs. The approach is illustrated with two simple but non-trivial solution examples, and factors that dictate the reliability of the result are discussed.

We introduce a deep neural network to model in a symmetry preserving way the environmental dependence of the centers of the electronic charge. The model learns from ab initio density functional theory, wherein the electronic centers are uniquely assigned by the maximally localized Wannier functions. When combined with the deep potential model of the atomic potential energy surface, the scheme predicts the dielectric response of insulators for trajectories inaccessible to direct ab initio simulation. The scheme is nonperturbative and can capture the response of a mutating chemical environment. We demonstrate the approach by calculating the infrared spectra of liquid water at standard conditions, and of ice under extreme pressure, when it transforms from a molecular to an ionic crystal.

In recent years, promising deep learning based interatomic potential energy surface (PES) models have been proposed that can potentially allow us to perform molecular dynamics simulations for large scale systems with quantum accuracy. However, making these models truly reliable and practically useful is still a very non-trivial task. A key component in this task is the generation of datasets used in model training. In this paper, we introduce the Deep Potential GENerator (DP-GEN), an open-source software platform that implements the recently proposed ”on-the-fly” learning procedure (Zhang et al. 2019) and is capable of generating uniformly accurate deep learning based PES models in a way that minimizes human intervention and the computational cost for data generation and model training. DP-GEN automatically and iteratively performs three steps: exploration, labeling, and training. It supports various popular packages for these three steps: LAMMPS for exploration, Quantum Espresso, VASP, CP2K, etc. for labeling, and DeePMD-kit for training. It also allows automatic job submission and result collection on different types of machines, such as high performance clusters and cloud machines, and is adaptive to different job management tools, including Slurm, PBS, and LSF. As a concrete example, we illustrate the details of the process for generating a general-purpose PES model for Cu using DP-GEN.
Program summary
Program Title: DP-GEN
Program Files doi: http://dx.doi.org/10.17632/sxybkgc5xc.1
Licensing provisions: LGPL
Programming language: Python
Nature of problem: Generating reliable deep learning based potential energy models with minimal human intervention and computational cost.
Solution method: The concurrent learning scheme is implemented. Supports for sampling configuration space with LAMMPS, generating ab initio data with Quantum Espresso, VASP, CP2K and training potential models with DeePMD-kit are provided. Supports for different machines including workstations, high performance clusters and cloud machines are provided. Supports for job management tools including Slurm, PBS, LSF are provided.

The use of supervised machine learning to develop fast and accurate interatomic potential models is transforming molecular and materials research by greatly accelerating atomic-scale simulations with little loss of accuracy. Three years ago, Jörg Behler published a perspective in this journal providing an overview of some of the leading methods in this field. In this perspective, we provide an updated discussion of recent developments, emerging trends, and promising areas for future research in this field. We include in this discussion an overview of three emerging approaches to developing machine-learned interatomic potential models that have not been extensively discussed in existing reviews: moment tensor potentials, message-passing networks, and symbolic regression.

We use the PBE0/6-31G* density functional method to perform ab initio quantum mechanical/molecular mechanical (QM/MM) molecular dynamics (MD) simulations under periodic boundary conditions with rigorous electrostatics using the ambient potential composite Ewald method in order to test the convergence of MM→QM/MM free energy corrections for the prediction of 17 small-molecule solvation free energies and 8 ligand binding free energies to T4 lysozyme. The ``indirect'' thermodynamic cycle for calculating free energies is used to explore whether a series of reference potentials improve the statistical quality of the predictions. Specifically, we construct a series of reference potentials that optimizes a molecular mechanical (MM) force field's parameters to reproduce the ab initio QM/MM forces from a QM/MM simulation. The optimizations form a systematic progression of successively expanded parameters that include bond, angle, dihedral and charge parameters. For each reference potential, we calculate benchmark quality reference values for the MM→QM/MM correction by performing the mixed MM and QM/MM Hamiltonians at 11 intermediate states, each for 200 ps. We then compare forward and reverse application of Zwanzig's relation, thermodynamic integration, and Bennett's acceptance ratio (BAR) methods as a function of reference potential, simulation time, and the number of simulated intermediate states. We find that Zwanzig's equation is inadequate unless a large number of intermediate states are explicitly simulated. The TI and BAR mean signed errors are very small even when only the end-state simulations are considered, and the standard deviation of the TI and BAR errors are decreased by choosing a reference potential that optimizes the bond and angle parameters. We find a robust approach for the data sets of fairly rigid molecules considered here is to use bond+angle reference potential together with the end-state-only BAR analysis. This requires a QM/MM simulations to be performed in order to generate reference data to parameterize the bond+angle reference potential, and then this same simulation serves a dual purpose as the full QM/MM end-state. The convergence of the results with respect to time suggests that computational resources may be used more efficiently by running multiple simulations for no more than 50 ps, rather than running one long simulation.

An efficient and accurate reference potential simulation protocol is proposed for producing ab initio quantum mechanical molecular mechanical (AI-QM/MM) quality free energy profiles for chemical reactions in a solvent or macromolecular environment. This protocol involves three stages: (a) using force matching to recalibrate a semi-empirical quantum mechanical (SE-QM) Hamiltonian for the specific reaction under study; (b) employing the recalibrated SE-QM Hamiltonian (in combination with molecular mechanical force fields) as the reference potential to drive umbrella samplings along the reaction pathway; and (c) computing AI-QM/MM energy values for collected configurations from the sampling and performing weighted thermodynamic perturbation to acquire AI-QM/MM corrected reaction free energy profile. For three model reactions (identity SN2 reaction, Menshutkin reaction, and glycine proton transfer reaction) in aqueous solution and one enzyme reaction (Claisen arrangement in chorismate mutase), our simulations using recalibrated PM3 SE-QM Hamiltonians well reproduced AI-QM/MM free energy profiles (at the B3LYP/6-31G* level of theory) all within 1 kcal/mol with a 20 to 45 fold reduction in the computer time.

We propose a simple, but efficient and accurate machine learning (ML) model for developing high-dimensional potential energy surface. This so-called embedded atom neural network (EANN) approach is inspired by the well-known empirical embedded atom method (EAM) model used in condensed phase. It simply replaces the scalar embedded atom density in EAM with a Gaussian-type orbital based density vector, and represents the complex relationship between the embedded density vector and atomic energy by neural networks. We demonstrate that the EANN approach is equally accurate as several established ML models in representing