## No full-text available

To read the full-text of this research,

you can request a copy directly from the authors.

To read the full-text of this research,

you can request a copy directly from the authors.

... An attractive alternative to ab initio QM/MM simulation is the design quantum mechanical force fields 2,16,17 and machine learning models. [18][19][20][21][22][23] Of particular relevance to the current work is the development of QM/MM-∆MLP models, whereby the energies and forces of a fast, approximate QM model are corrected with a machine-learning potential. 20,[24][25][26][27][28][29][30] These models have the potential to offer the computational efficiency needed to address complex chemical mechanisms that require sampling of high-dimensional free energy surfaces, while providing accuracy comparable to high level QM methods. ...

... [18][19][20][21][22][23] Of particular relevance to the current work is the development of QM/MM-∆MLP models, whereby the energies and forces of a fast, approximate QM model are corrected with a machine-learning potential. 20,[24][25][26][27][28][29][30] These models have the potential to offer the computational efficiency needed to address complex chemical mechanisms that require sampling of high-dimensional free energy surfaces, while providing accuracy comparable to high level QM methods. A barrier to progress in the development and validation of such methods is their availability in flexible software packages that enable a wide range of applications in the condensed phase. ...

... The expressions for the E i values from the neural network can be found elsewhere. 20,21 The atomic decomposition of DPRc model is readily amenable to parallel calculation. The DPRc contribution to the energy is activated by setting the idprc=1 option within the Sander input &dprc Fortran namelist. ...

We report the development and testing of new integrated cyberinfrastructure for performing free energy simulations with generalized hybrid quantum mechanical/molecular mechanical (QM/MM) and machine learning potentials (MLPs) in Amber. The Sander molecular dynamics program has been extended to leverage fast, density-functional tight-binding models implemented in the DFTB+ and xTB packages, and an interface to the DeePMD-kit software enables the use of MLPs. The software is integrated through application program interfaces that circumvent the need to perform “system calls” and enable the incorporation of long-range Ewald electrostatics into the external software’s self-consistent field procedure. The infrastructure provides access to QM/MM models that may serve as the foundation for QM/MM–ΔMLP potentials, which supplement the semiempirical QM/MM model with a MLP correction trained to reproduce ab initio QM/MM energies and forces. Efficient optimization of minimum free energy pathways is enabled through a new surface-accelerated finite-temperature string method implemented in the FE-ToolKit package. Furthermore, we interfaced Sander with the i-PI software by implementing the socket communication protocol used in the i-PI client–server model. The new interface with i-PI allows for the treatment of nuclear quantum effects with semiempirical QM/MM–ΔMLP models. The modular interoperable software is demonstrated on proton transfer reactions in guanine-thymine mispairs in a B-form deoxyribonucleic acid helix. The current work represents a considerable advance in the development of modular software for performing free energy simulations of chemical reactions that are important in a wide range of applications.

... These three methods are already sufficient to construct a conventional simulation context employing the DP model within OpenMM, although the utility of the third method is not always needed. Recent studies, especially in the realm of biomolecular systems, show that most applications of MLPs in molecular dynamic simulations are based on hybrid MLP/MM frameworks [48,[61][62][63][64][65] In such schemes, the particles described by the MLP models constitute a subset of the entire system. For implementing the hybrid MLP/MM scheme in OpenMM, the DeepPotentialModel class offers two optional methods that allow users to specify which particles will be input to the DP model and generate the Force object designated for integration with classical force fields. ...

... Consequently, the hybrid MLP/MM scheme emerges as a more practical approach for modeling biomolecules with MLPs. In this scheme, the MLP is utilized to delineate the intramolecular interactions [65,91,92] or accuracy-critical reactions [48,63,93] for a subset of the entire simulation system. ...

... Different selection methods within the hybrid DP/MM model cater to diverse application needs. For example, the DP model with pre-selected particles can be deployed either independently [65,[91][92][93] or alongside QM/MM methods [48,63,64] to depict the intramolecular interactions for the subset of interests. When used in isolation, interactions among these particles modeled by classical force fields can be disregarded prior to simulation. ...

Machine learning potentials, particularly the deep potential (DP) model, have revolutionized molecular dynamics (MD) simulations, striking a balance between accuracy and computational efficiency. To facilitate the DP model’s integration with the popular MD engine OpenMM, we have developed a versatile OpenMM plugin. This plugin supports a range of applications, from conventional MD simulations to alchemical free energy calculations and hybrid DP/MM simulations. Our extensive validation tests encompassed energy conservation in microcanonical ensemble simulations, fidelity in canonical ensemble generation, and the evaluation of the structural, transport, and thermodynamic properties of bulk water. The introduction of this plugin is expected to significantly expand the application scope of DP models within the MD simulation community, representing a major advancement in the field.

... To this end, several groups have employed delta (Δ) machine learning (ML) potentials, primarily based on neural networks (NN), as a means to directly correct SE/MM potential energy for improved QM/MM simulations. [21][22][23][24][25] Despite the overall similarity in the Δlearning theme following the work of Ramakrishan et al., 26 these works differ in their choice of base levels, NN potentials' descriptors and topological features, and loss function construction. Building on their earlier work using NN potentials for free energy perturbation, 27 Yang and co-workers have developed a method for training artificial neural networks (ANN) to correct DFTB/MM to the DFT/MM level using energy-only-based loss functions during MD simulations. ...

... 21 Since force matching (FM) can greatly improve the phase-space overlap involved and therefore the quality of dynamics, 17,19 it is generally desirable to include forces directly in the loss function when training the NN potentials. [22][23][24][25] For example, Böselt et al. 22 used symmetry-function descriptors as input features to train their high-dimensional neural network potentials (HDNNP), based on both energy and forces, for predicting corrections for DFTB/MM in simulating stable and transition-state species in solution. Pan et al. used FM-recalibrated SE method to ensure that the training configurations are sampled in the relevant phase space. ...

... 23 Based on the DeepPot-SE 28 and the DeepMD framework, 29,30 their ML model uses an embedding network to encode input features and incorporates both energy and force differences into the loss function to ensure accurate dynamics for MD-based free energy simulations of solution-phase and enzyme reactions. 23 Other recent QM/MM developments utilizing DeepPot and local environment descriptors include the range-corrected deep learning scheme of Zeng et al. 24 and the DFTB/MM-based ML model of Gomez-Flores et al. 25 One notable issue with conventional NN potentials is that they generally do not provide a metric to assess the uncertainties in energy and force predictions. For MD-based QM/MM simulations, in particular, free energy path simulations, this poses a question on how to maintain the robustness of NN models when the trajectories sampled on the NN-predicted PES deviate from the original training configuration space. ...

Free energy simulations that employ combined quantum mechanical and molecular mechanical (QM/MM) potentials at ab initio QM (AI) levels are computationally highly demanding. Here, we present a machine-learning-facilitated approach for obtaining AI/MM-quality free energy profiles at the cost of efficient semiempirical QM/MM (SE/MM) methods. Specifically, we use Gaussian process regression (GPR) to learn the potential energy corrections needed for an SE/MM level to match an AI/MM target along the minimum free energy path (MFEP). Force modification using gradients of the GPR potential allows us to improve configurational sampling and update the MFEP. To adaptively train our model, we further employ the sparse variational GP (SVGP) and streaming sparse GPR (SSGPR) methods, which efficiently incorporate previous sample information without significantly increasing the training data size. We applied the QM-(SS)GPR/MM method to the solution-phase SN2 Menshutkin reaction, NH3+CH3Cl→CH3NH3++Cl−, using AM1/MM and B3LYP/6-31+G(d,p)/MM as the base and target levels, respectively. For 4000 configurations sampled along the MFEP, the iteratively optimized AM1-SSGPR-4/MM model reduces the energy error in AM1/MM from 18.2 to 4.4 kcal/mol. Although not explicitly fitting forces, our method also reduces the key internal force errors from 25.5 to 11.1 kcal/mol/Å and from 30.2 to 10.3 kcal/mol/Å for the N–C and C–Cl bonds, respectively. Compared to the uncorrected simulations, the AM1-SSGPR-4/MM method lowers the predicted free energy barrier from 28.7 to 11.7 kcal/mol and decreases the reaction free energy from −12.4 to −41.9 kcal/mol, bringing these results into closer agreement with their AI/MM and experimental benchmarks.

... The package was first released in 2017 29 and has since undergone rapid development with contributions from many developers. The DeePMD-kit implements a series of MLP models known as Deep Potential (DP) models, 9,10,[50][51][52][53][54] which have been widely adopted in the fields of physics, chemistry, biology, and material science for studying a broad range of atomistic systems. These systems include metallic materials, 55 non-metallic inorganic materials, [56][57][58][59][60] water, [61][62][63][64][65][66][67][68][69][70][71] organic systems, 10,72 solutions, 52,73-76 gasphase systems, [77][78][79][80] macromolecular systems, 81,82 and interfaces. ...

... This limits the predictive capability of the methods in condensed-phase simulations. Zeng et al. 52 created a new Δ-MLP method called Deep Potential-Range correction (DPRc) to integrate with combined quantum mechanical/molecular mechanical (QM/MM) potentials, which corrects the potential energy from a fast, linear-scaling low-level semiempirical QM/MM theory to a high-level ab initio QM/MM theory. Unlike many of the emerging Δ-MLPs that correct internal QM energy and forces, the DPRc model corrects both the QM-QM and QM-MM interactions of a QM/MM calculation in a manner that conserves energy as MM atoms enter (or leave) the vicinity of the QM region. ...

... 80 Compared to its initial release, 29 DeePMD-kit has evolved significantly, with the current version (v2.2.1) offering an extensive range of features. These include DeepPot-SE, attentionbased, and hybrid descriptors, 10,50,51,53 the ability to fit tensorial properties, 105,106 type embedding, model deviation, 103,107 Deep Potential-Range Correction (DPRc), 52,75 Deep Potential Long Range (DPLR), 53 graphics processing unit (GPU) support for customized operators, 108 model compression, 109 non-von Neumann molecular dynamics (NVNMD), 110 and various usability improvements, such as documentation, compiled binary packages, graphical user interfaces (GUIs), and application programming interfaces (APIs). This article provides an overview of the current major version of the DeePMD-kit, highlighting its features and technical details, presenting a comprehensive procedure for conducting molecular dynamics as a representative application, benchmarking the accuracy and efficiency of different models, and discussing ongoing developments. ...

DeePMD-kit is a powerful open-source software package that facilitates molecular dynamics simulations using machine learning potentials known as Deep Potential (DP) models. This package, which was released in 2017, has been widely used in the fields of physics, chemistry, biology, and material science for studying atomistic systems. The current version of DeePMD-kit offers numerous advanced features, such as DeepPot-SE, attention-based and hybrid descriptors, the ability to fit tensorial properties, type embedding, model deviation, DP-range correction, DP long range, graphics processing unit support for customized operators, model compression, non-von Neumann molecular dynamics, and improved usability, including documentation, compiled binary packages, graphical user interfaces, and application programming interfaces. This article presents an overview of the current major version of the DeePMD-kit package, highlighting its features and technical details. Additionally, this article presents a comprehensive procedure for conducting molecular dynamics as a representative application, benchmarks the accuracy and efficiency of different models, and discusses ongoing developments.

... The package was first released in 2017 19 and has since undergone rapid development with contributions from many developers. DeePMD-kit implements a series of MLP models known as Deep Potential (DP) models, 9,10,[41][42][43][44][45] which have been widely adopted in the fields of physics, chemistry, biology, and material science for studying a broad range of atomistic systems. These systems include metallic materials [46][47][48][49][50][51][52][53][54][55][56][57][58][59][60][61] , non-metallic inorganic materials [62][63][64][65][66] , water 67-77 , organic systems, 10,78 solutions 43,79-82 , gas-phase systems [83][84][85][86] , macromolecular systems, 87,88 and interfaces [89][90][91][92][93] . ...

... Compared to its initial release 19 , DeePMD-kit has evolved significantly, with the current version (v2.2.1) offering an extensive range of features. These include DeepPot-SE, attentionbased, and hybrid descriptors 10,41,42,44 , the ability to fit tensorial properties 97,98 , type embedding, model deviation 99,100 , Deep Potential -Range Correction (DPRc) 43,81 , Deep Potential Long Range (DPLR) 44 , graphics processing unit (GPU) support for customized operators 101 , model compression 102 , non-von Neumann molecular dynamics (NVNMD) 103 , and various usability improvements such as documentation, compiled binary packages, graphical user interfaces (GUI), and application programming interfaces (API). This article provides an overview of the current major additions to the DeePMD-kit, highlighting its features and technical details, benchmarking the accuracy and efficiency of different models, and dis- cussing ongoing developments. ...

... Deep Potential -Range Correction (DPRc) 43,81 was initially designed to correct the potential energy from a fast, linear-scaling low-level semiempirical QM/MM theory to a highlevel ab initio QM/MM theory in a range-correction way to quantitatively correct short and mid-range non-bonded interactions leveraging the non-bonded lists routinely used in molecular dynamics simulations using molecular mechanical force fields such as AMBER. 108 In this way, long-ranged electrostatic interactions can be modeled efficiently using the particle mesh Ewald method 108 or its extensions for multipolar 109,110 and QM/MM 111,112 potentials. ...

DeePMD-kit is a powerful open-source software package that facilitates molecular dynamics simulations using machine learning potentials (MLP) known as Deep Potential (DP) models. This package, which was released in 2017, has been widely used in the fields of physics, chemistry, biology, and material science for studying atomistic systems. The current version of DeePMD-kit offers numerous advanced features such as DeepPot-SE, attention-based and hybrid descriptors, the ability to fit tensile properties, type embedding, model deviation, Deep Potential - Range Correction (DPRc), Deep Potential Long Range (DPLR), GPU support for customized operators, model compression, non-von Neumann molecular dynamics (NVNMD), and improved usability, including documentation, compiled binary packages, graphical user interfaces (GUI), and application programming interfaces (API). This article presents an overview of the current major version of the DeePMD-kit package, highlighting its features and technical details. Additionally, the article benchmarks the accuracy and efficiency of different models and discusses ongoing developments.

... In addition to enabling quantum mechanical accuracy, current DP models have the following characteristics: (i) preserving the symmetry of the system, especially when there are multiple elemental species; (ii) having high computational efficiency, being at least five orders of magnitude faster than DFT; (iii) being end-to-end and therefore having little human intervention; (iv) supporting MPI and GPU, making it highly efficient on modern heterogeneous high-performance supercomputers. Thanks to these points, the DP models have been successfully employed in studies of water and water-containing systems (Calegari Sommers et al., 2020;Xu et al., 2020;and Tisi et al., 2021), metals and alloys Wang et al., 2020;Jiang et al., 2021;, phase diagrams (Niu et al., 2020;Yang et al., 2021;, high-entropy ceramics (Dai et al., 2020(Dai et al., , 2021, chemical reaction (Zeng et al., , 2021a(Zeng et al., , 2021b, solid-state electrolytes , ionic liquids (Liang et al., 2021), etc. We refer to Wen et al. (2022) for a recent review of DP for material systems. ...

... In this section, we briefly introduce some recent progress on biocatalysis simulations that require a more delicate treatment on the DeePMD-kit. We refer to Zeng et al. (2021a) for more details. ...

... In order to tackle these biological problems, recently, the DeePMD-kit was integrated with the AMBER molecular simulation software suite (Case et al., 2020) in order to perform biocatalysis simulations (Zeng et al., 2021a). This interface enables ab initio combined quantum mechanical/molecular mechanical (QM/MM) simulations with a rigorous treatment of long-ranged electrostatic interactions under periodic boundary conditions (Giese and York, 2016;and Pan et al., 2021), and affords a powerful tool to gain insight into the pathways of biocatalysis reactions, their transition states and intermediates, and environmental factors that modulate reactivity (Gaines et al., 2019;and Ganguly et al., 2020). ...

DESCRIPTION
Coarse-grained (CG) molecular dynamics simulations of integral membrane proteins have gained wide popularity because they provide a cost-effective but still accurate description of the protein-membrane interactions as a whole and on the role of individual lipidic species. Therefore, they can provide biologically meaningful information at a resolution comparable to those accessible to experimental techniques. However, the simulation of membrane proteins remains a challenging task that requires specific expertise, as external pressures and solvation need to be carefully handled. CG simulations that lump several water molecules into one single supramolecular moiety may present further intricacies due to bulkier solvent representations or model-dependent compressibilities. This chapter provides a detailed protocol for setting up, running, and analyzing CG simulations of membrane proteins using the SIRAH force field for CG simulations within the AMBER package.

... In addition to enabling quantum mechanical accuracy, current DP models have the following characteristics: (i) preserving the symmetry of the system, especially when there are multiple elemental species; (ii) having high computational efficiency, being at least five orders of magnitude faster than DFT; (iii) being end-to-end and therefore having little human intervention; (iv) supporting MPI and GPU, making it highly efficient on modern heterogeneous high-performance supercomputers. Thanks to these points, the DP models have been successfully employed in studies of water and water-containing systems (Calegari Sommers et al., 2020;Xu et al., 2020;and Tisi et al., 2021), metals and alloys (Zhang et al., 2019;Wang et al., 2020;Jiang et al., 2021;, phase diagrams (Niu et al., 2020;Yang et al., 2021;, high-entropy ceramics (Dai et al., 2020(Dai et al., , 2021, chemical reaction (Zeng et al., , 2021a(Zeng et al., , 2021b, solid-state electrolytes (Huang et al., 2021), ionic liquids (Liang et al., 2021), etc. We refer to Wen et al. (2022) for a recent review of DP for material systems. ...

... In this section, we briefly introduce some recent progress on biocatalysis simulations that require a more delicate treatment on the DeePMD-kit. We refer to Zeng et al. (2021a) for more details. ...

... In order to tackle these biological problems, recently, the DeePMD-kit was integrated with the AMBER molecular simulation software suite (Case et al., 2020) in order to perform biocatalysis simulations (Zeng et al., 2021a). This interface enables ab initio combined quantum mechanical/molecular mechanical (QM/MM) simulations with a rigorous treatment of long-ranged electrostatic interactions under periodic boundary conditions (Giese and York, 2016;and Pan et al., 2021), and affords a powerful tool to gain insight into the pathways of biocatalysis reactions, their transition states and intermediates, and environmental factors that modulate reactivity (Gaines et al., 2019;and Ganguly et al., 2020). ...

DESCRIPTION
A new direction has emerged in molecular simulations in recent years, where potential energy surfaces (PES) are constructed using machine learning (ML) methods. These ML models, combining the accuracy of quantum mechanical models and the efficiency of empirical atomic potential models, have been demonstrated by many studies to have extensive application prospects. This chapter introduces a recently developed ML model, Deep Potential (DP), and the corresponding package, DeePMD-kit. First, we present the basic theory of the DP method. Then, we show how to train and test a DP model for a gas-phase methane molecule using the DeePMD-kit package. Next, we introduce some recent progress on simulations of biomolecular processes by integrating the DeePMD-kit with the AMBER molecular simulation software suite. Finally, we provide a supplement on points that require further explanation.

... 46 In the present work, we develop a Quantum Deep-learning Potential Interaction (QDπ) model that uses a fast third-order self-consistent density-functional tight-binding (DFTB3/3OB) model 47,48 that is corrected to a quantitatively high-level of accuracy through a range-corrected deep-learning potential (DPRc). 49,50 In this way, the QDπ model developed here is the form of a quantum mechanical/machine learning potential correction (QM/Δ-MLP). 35,49−54 The use of DFTB3 as a robust QM base model has several important advantages. ...

... The QDπ model is trained to be a QM/Δ-MLP; i.e., a nonelectronic DPRc "correction" to the DFTB3/3OB 75 QM model potential energy similar to previous work. 49,50 2.1.1. Broad Data Sets: ANI-1xm and COMP5m. ...

... The next step of future work will involve developing an intermolecular QM/MM interaction potential as a new rangecorrected deep-learning potential. 49,50 The full (internal and intermolecular interaction) QDπ model is designed to be a correction to the QM/MM potential energy using DFTB3/ 3OB and the latest AMBER FF19SB for proteins, 126 OL3/ OL15 for nucleic acids, 127−129 OPC model for water, 130,131 and 12−6−4 ion models. 132−134 Once the intermolecular interaction component of the QDπ model has been developed and validated in alchemical free energy simulations, 5 next steps will be to extend the chemical space of drug molecules to include P, S, F, and Cl atoms. ...

We report QDπ-v1.0 for modeling the internal energy of drug molecules containing H, C, N, and O atoms. The QDπ model is in the form of a quantum mechanical/machine learning potential correction (QM/Δ-MLP) that uses a fast third-order self-consistent density-functional tight-binding (DFTB3/3OB) model that is corrected to a quantitatively high-level of accuracy through a deep-learning potential (DeepPot-SE). The model has the advantage that it is able to properly treat electrostatic interactions and handle changes in charge/protonation states. The model is trained against reference data computed at the ωB97X/6-31G* level (as in the ANI-1x data set) and compared to several other approximate semiempirical and machine learning potentials (ANI-1x, ANI-2x, DFTB3, MNDO/d, AM1, PM6, GFN1-xTB, and GFN2-xTB). The QDπ model is demonstrated to be accurate for a wide range of intra- and intermolecular interactions (despite its intended use as an internal energy model) and has shown to perform exceptionally well for relative protonation/deprotonation energies and tautomers. An example application to model reactions involved in RNA strand cleavage catalyzed by protein and nucleic acid enzymes illustrates QDπ has average errors less than 0.5 kcal/mol, whereas the other models compared have errors over an order of magnitude greater. Taken together, this makes QDπ highly attractive as a potential force field model for drug discovery.

... 47,[60][61][62][63][64][65][66][67][68][69][70] A MLP-corrected semiempirical QM/MM model naturally serves as an excellent reference potential to estimate the ab initio FES; however, the active learning procedure used to train MLPs produces several neural network parameter sets (several potentials). 71,72 The gwTP method provides a means to estimate the ab initio FES from the aggregate sampling performed with each potential. As multiple independent simulation runs are typically performed in order to produce robust averages and error estimates, use of different reference potentials can often be accommodated for no added computational cost. ...

... For the purpose of providing a stringent test case for demonstration, the 4 reference potentials were specifically designed such that none of them accurately reproduce the target FES throughout the entire range of ξ PT values. The reference potentials use the MNDO/d semiempirical Hamiltonian supplemented with a range-corrected deep potential 71,83 (DPRc) MLP. We trained 4 ad hoc MNDO/d QM/MM+DPRc potentials using different target data to yield significantly different reference potentials. ...

... The optimization consisted of 200k steps with initial and final learning rates of 10 −3 and 5·10 −8 respectively. The initial optimization was followed by 9 cycles of active learning to search for additional training data.71 Each active learning cycle performs 4 parameter optimizations to yield 4 trial parameter sets. ...

We describe the generalized weighted thermodynamic perturbation (gwTP) method for estimating the free energy surface of an expensive "high-level" potential energy function from the umbrella sampling performed with multiple inexpensive "low-level" reference potentials. The gwTP method is a generalization of the weighted thermodynamic perturbation (wTP) method developed by Li and co-workers [J. Chem. Theory Comput. 2018, 14, 5583-5596] that uses a single "low-level" reference potential. The gwTP method offers new possibilities in model design whereby the sampling generated from several low-level potentials may be combined (e.g., specific reaction parameter models that might have variable accuracy at different stages of a multistep reaction). The gwTP method is especially well suited for use with machine learning potentials (MLPs) that are trained against computationally expensive ab initio quantum mechanical/molecular mechanical (QM/MM) energies and forces using active learning procedures that naturally produce multiple distinct neural network potentials. Simulations can be performed with greater sampling using the fast MLPs and then corrected to the ab initio level using gwTP. The capabilities of the gwTP method are demonstrated by creating reference potentials based on the MNDO/d and DFTB2/MIO semiempirical models supplemented with the "range-corrected deep potential" (DPRc). The DPRc parameters are trained to ab initio QM/MM data, and the potentials are used to calculate the free energy surface of stepwise mechanisms for nonenzymatic RNA 2'-O-transesterification model reactions. The extended sampling made possible by the reference potentials allows one to identify unequilibrated portions of the simulations that are not always evident from the short time scale commonly used with ab initio QM/MM potentials. We show that the reference potential approach can yield more accurate ab initio free energy predictions than the wTP method or what can be reasonably afforded from explicit ab initio QM/MM sampling.

... These models can be parameterized to improve their accuracy, for example, using force matching to higher levels [71][72][73]. Machine learning potentials (MLPs) have shown particular promise in enhancing the accuracy and performance of condensed-phase simulations of chemical reactions [33,[74][75][76][77][78]. Of particular relevance to the current work is the development of QM/MM-∆MLP models, whereby the energies and forces of a fast, approximate QM model are corrected with a machine learning potential [76,[79][80][81][82][83][84][85]. ...

... Machine learning potentials (MLPs) have shown particular promise in enhancing the accuracy and performance of condensed-phase simulations of chemical reactions [33,[74][75][76][77][78]. Of particular relevance to the current work is the development of QM/MM-∆MLP models, whereby the energies and forces of a fast, approximate QM model are corrected with a machine learning potential [76,[79][80][81][82][83][84][85]. These are described in more detail in the supporting information. ...

Rare tautomeric forms of nucleobases can lead to Watson–Crick-like (WC-like) mispairs in DNA, but the process of proton transfer is fast and difficult to detect experimentally. NMR studies show evidence for the existence of short-time WC-like guanine–thymine (G-T) mispairs; however, the mechanism of proton transfer and the degree to which nuclear quantum effects play a role are unclear. We use a B-DNA helix exhibiting a wGT mispair as a model system to study tautomerization reactions. We perform ab initio (PBE0/6-31G*) quantum mechanical/molecular mechanical (QM/MM) simulations to examine the free energy surface for tautomerization. We demonstrate that while the ab initio QM/MM simulations are accurate, considerable sampling is required to achieve high precision in the free energy barriers. To address this problem, we develop a QM/MM machine learning potential correction (QM/MM-ΔMLP) that is able to improve the computational efficiency, greatly extend the accessible time scales of the simulations, and enable practical application of path integral molecular dynamics to examine nuclear quantum effects. We find that the inclusion of nuclear quantum effects has only a modest effect on the mechanistic pathway but leads to a considerable lowering of the free energy barrier for the GT*⇌G*T equilibrium. Our results enable a rationalization of observed experimental data and the prediction of populations of rare tautomeric forms of nucleobases and rates of their interconversion in B-DNA.

... deep neural network) for free energy predictions of condensed phase reactions. 15,18,19,[23][24][25][26][27][28] Many of these studies 18,19,[23][24][25]28 combine the ML models with semiempirical or lower-level QM/MM methods to obtain the energy predictions that match the accuracy of higher-level QM/MM methods. For example, Gómez-Flores et al. 19 used a ML approach to predict the energy difference between the density functional tight-binding model and other higher level QM methods for a thiol-disulde exchange reaction in water. ...

... deep neural network) for free energy predictions of condensed phase reactions. 15,18,19,[23][24][25][26][27][28] Many of these studies 18,19,[23][24][25]28 combine the ML models with semiempirical or lower-level QM/MM methods to obtain the energy predictions that match the accuracy of higher-level QM/MM methods. For example, Gómez-Flores et al. 19 used a ML approach to predict the energy difference between the density functional tight-binding model and other higher level QM methods for a thiol-disulde exchange reaction in water. ...

Fast and accurate prediction of solvent effects on reaction rates are crucial for kinetic modeling, chemical process design, and high-throughput solvent screening. Despite the recent advance in machine learning, a scarcity of reliable data has hindered the development of predictive models that are generalizable for diverse reactions and solvents. In this work, we generate a large set of data with the COSMO-RS method for over 28 000 neutral reactions and 295 solvents and train a machine learning model to predict the solvation free energy and solvation enthalpy of activation (ΔΔG‡solv, ΔΔH‡solv) for a solution phase reaction. On unseen reactions, the model achieves mean absolute errors of 0.71 and 1.03 kcal mol⁻¹ for ΔΔG‡solv and ΔΔH‡solv, respectively, relative to the COSMO-RS calculations. The model also provides reliable predictions of relative rate constants within a factor of 4 when tested on experimental data. The presented model can provide nearly instantaneous predictions of kinetic solvent effects or relative rate constants for a broad range of neutral closed-shell or free radical reactions and solvents only based on atom-mapped reaction SMILES and solvent SMILES strings.

... The AIQM1 method displayed good accuracy even for challenging systems (such as ions and excited states). Additionally, several groups adopted ML to replace or correct the expensive QM method in ab initio QM/MM methods [40][41][42] to enhance accuracy and reduce the high computational cost of QM/MM molecular dynamics simulations [43][44][45][46][47][48] . Inspired by these encouraging results in new MLPs, we envision that MLPs will introduce a new opportunity to develop a faster and more accurate QR method 49,50 , since high-level CC methods are rarely used in QR applications due to their prohibitive computational costs 22 . ...

Biomacromolecule structures are essential for drug development and biocatalysis. Quantum refinement (QR) methods, which employ reliable quantum mechanics (QM) methods in crystallographic refinement, showed promise in improving the structural quality or even correcting the structure of biomacromolecules. However, vast computational costs and complex quantum mechanics/molecular mechanics (QM/MM) setups limit QR applications. Here we incorporate robust machine learning potentials (MLPs) in multiscale ONIOM(QM:MM) schemes to describe the core parts (e.g., drugs/inhibitors), replacing the expensive QM method. Additionally, two levels of MLPs are combined for the first time to overcome MLP limitations. Our unique MLPs+ONIOM-based QR methods achieve QM-level accuracy with significantly higher efficiency. Furthermore, our refinements provide computational evidence for the existence of bonded and nonbonded forms of the Food and Drug Administration (FDA)-approved drug nirmatrelvir in one SARS-CoV-2 main protease structure. This study highlights that powerful MLPs accelerate QRs for reliable protein–drug complexes, promote broader QR applications and provide more atomistic insights into drug development.

... This feature makes them the best candidates for studying molecular chemical systems that have nonlinear environments and high complexity. Hence, the development of high-dimensional neural network potentials (HDNNPs) based on deep learning has paved the way for the use of ML potentials in the molecular computation of large systems [103][104][105]. ...

Many drug molecules contain functional groups, resulting in a torsional barrier corresponding to rotation around the bond linking the fragments. In medicinal chemistry and pharmaceutical sciences, inclusive of drug design studies, the exact calculation of the potential energy surface of these molecular torsions is extremely important and precious. Machine learning, including deep learning, is currently one of the most rapidly evolving tools in computer-aided drug discovery and molecular simulations. In this work, we used ANI-1x neural network potential as a quantum-level machine learning to predict the PESs of the Selegiline antiparkinsonian drug molecule. Also, DFT calculations at the wB97X/6-31G(d) level of theory have been used to study the structural parameters and vibrational normal modes of the Selegiline molecule. We succeeded in calculating the vibrational frequencies, electronic energy and optimization of the molecular structure of the Selegiline using the ANI-1x dataset in a very short computing cost. From this perspective, we expect the ANI-1x dataset applied in this work to be appreciably efficient and effective in computational structure-based drug design studies.

... It is then not surprising that recent explosive development of machine learning (ML) techniques, 93 including deep neural networks (DNNs), [94][95][96] is already making a noticeable impact on this field, [97][98][99][100][101][102][103][104][105][106][107][108][109][110] including works aimed directly at improving the accuracy of description of complex solvation effects. [111][112][113][114][115][116][117] It should be noted that the majority of these recent works combine QM-based methodology with ML while our interest here is purely classical approaches. ...

... The final network expression still corresponds to a many-body form rather than a two-body form. More details about DPRc are given in the work of Zeng et al. 35 2.3. Computational Details. ...

As an ensemble average result, vibrational spectrum simulation can be time-consuming with high accuracy methods. We present a machine learning approach based on the range-corrected deep potential (DPRc) model to improve the computing efficiency. The DPRc method divides the system into "probe region" and "solvent region"; "solvent-solvent" interactions are not counted in the neural network. We applied the approach to two systems: formic acid C═O stretching and MeCN C≡N stretching vibrational frequency shifts in water. All data sets were prepared using the quantum vibration perturbation approach. Effects of different region divisions, one-body correction, cut range, and training data size were tested. The model with a single-molecule "probe region" showed stable accuracy; it ran roughly 10 times faster than regular deep potential and reduced the training time by about four. The approach is efficient, easy to apply, and extendable to calculating various spectra.

... It is not practical to incorporate all MM atoms (in addition to QM atoms) in the training of these potentials, as this would lead to an explosively large array of descriptors, the most straightforward way is to include only MM atoms within a distance cutoff from the QM region in the MLP training. [64][65][66] Alternatively, one can adopt an implicit description of the MM environment through the use of MM-perturbed semiempirical QM charges, 67,68 MM electrostatic potential or field at QM atom positions, 69,70 or through polarizable embedding. 71 One can also use both MM electrostatic potential and field in the training of QM/MM MLPs 72,73 using our QM/MM-AC scheme 74 for separating inner and outer MM atoms and projecting outer MM charges onto inner MM atom positions. ...

In the last several years, there has been a surge in the development of machine learning potential (MLP) models for describing molecular systems. We are interested in a particular area of this field — the training of system-specific MLPs for reactive systems — with the goal of using these MLPs to accelerate free energy simulations of chemical and enzyme reactions. To help new members in our labs become familiar with the basic techniques, we have put together a self-guided Colab tutorial (https://cc-ats.github.io/mlp_tutorial/), which we expect to be also useful to other young researchers in the community. Our tutorial begins with the introduction of simple fitting neural network (FNN) and kernel-based (using Gaussian Process Regression, GPR) models by fitting the two-dimensional Müller-Brown potential. Subsequently, two simple descriptors are presented for extracting features of molecular systems: symmetry functions (including the ANI variant) and embedding neural networks (such as DeepPot-SE). Lastly, these features will be fed into FNN and GPR models to reproduce the energy/force of molecular configurations of the Claisen rearrangement.

... The Δ-learning approach has been reported to accelerate the MD simulations for thermal reactions (e.g., SN2 reaction and Claisen rearrangement) using the HDNNP 147-149 and DeepPot-SE. 150 On the other hand, the ML potential, such as HDNNP, 151 FieldSchNet, 152 and Deep-Pot-SE 153 have been adapted to a QM calculator that predicts the energies and forces at the same level as the QM training data. These methods belong to the electronic embedding ML potential. ...

Machine learning (ML) continues to revolutionize computational chemistry by accelerating predictions and simulations by training on experimental or accurate but expensive quantum chemical (QC) calculations. Photodynamics simulations require hundreds of trajectories coupled with multiconfigurational QC calculations of energies, forces, and non-adiabatic couplings that contribute to the prohibitive computational cost at long timescales and complex organic molecules. ML accelerates photodynamics simulations by combining nonadiabatic photodynamics simulations with an ML model trained with high-fidelity QC calculations of energies, forces, and non-adiabatic couplings. This approach has provided time-dependent molecular structural information for understanding photochemical reaction mechanisms of organic reactions in vacuum and complex environments (i.e., explicit solvation). This review focuses on the fundamentals of QC calculations and machine learning techniques. We then discuss the strategies to balance adequate training data and the computational cost of generating these training data. Finally, we demonstrate the power of applying these ML-photodynamics simulations to understand the origin of reactivities and selectivities of organic photochemical reactions, such as cis-trans isomerization, [2+2]-cycloaddition, 4π-electrostatic ring-closing, and hydrogen roaming mechanism.

... We previously examined the six nonenzymatic phosphoryl transfer models in Ref. 14 to train a deep potential range corrected (DPRc) machine learning potential (Δ-MLP) 74 that supplements the second order density functional tight binding [75][76][77] (DFTB2) quantum mechanical (QM)/molecular mechanical (MM) Hamiltonian with a nonelectronic neural network correction parametrized to reproduce PBE0/6-31G * QM/MM energies and forces. The DFTB2 model is evaluated with the MIO parameter set and referred to as DFTB2/MIO. ...

We use the modified Bigeleisen–Mayer equation to compute kinetic isotope effect values for non-enzymatic phosphoryl transfer reactions from classical and path integral molecular dynamics umbrella sampling. The modified form of the Bigeleisen–Mayer equation consists of a ratio of imaginary mode vibrational frequencies and a contribution arising from the isotopic substitution’s effect on the activation free energy, which can be computed from path integral simulation. In the present study, we describe a practical method for estimating the frequency ratio correction directly from umbrella sampling in a manner that does not require normal mode analysis of many geometry optimized structures. Instead, the method relates the frequency ratio to the change in the mass weighted coordinate representation of the minimum free energy path at the transition state induced by isotopic substitution. The method is applied to the calculation of 16/18O and 32/34S primary kinetic isotope effect values for six non-enzymatic phosphoryl transfer reactions. We demonstrate that the results are consistent with the analysis of geometry optimized transition state ensembles using the traditional Bigeleisen–Mayer equation. The method thus presents a new practical tool to enable facile calculation of kinetic isotope effect values for complex chemical reactions in the condensed phase.

... Here, we restate the relevant theory in the frequency shift calculation task context. More details about DPRc for QM/MM can be found in the work of Zeng et al.27 ...

Vibrational spectrum simulation, as an ensemble average result, can be very time consuming when using high accuracy methods. Here, we introduce a new machine learning approach based on the range corrected deep potential (DPRc) model to improve computing efficiency. The approach was applied to computing \ch{C=O} stretching vibrational frequency shifts of formic acid-water solution. DPRc is adapted for frequency shift calculation. The system was divided into ``probe region'' and ``solvent region'' by atom. Three kinds of ``probe region'' were tested: single atom with atomic contribution correction, a single atom, and a single molecule. All data sets were prepared using by Quantum Vibration Perturbation (QVP) approach. The deep potential (DP) model was also adapted for frequency shift calculation for comparison, and different interaction cut-off radii were tested. The single molecule ``probe region'' results show the best accuracy, running roughly ten times faster than regular DP, while reducing the training time by a factor of about four, making it fully applicable in practice. The results show that dropping information of interaction distances between solvent atoms can significantly increase computing and training efficiency while ensuring little loss of accuracy. The protocol is practical, easy to apply, and extendable to calculating other physical quantities.

... One path forward that appears promising is to use machine-learning potentials (MLPs) either as stand-alone alternative models [39][40][41][42][43][44] , or else to augment existing semiempirical QM methods. [45][46][47][48][49][50][51] We will refer to the former class as "pure MLPs" and the latter class as "QM/∆-MLPs". MLPs have emerged as powerful tools to enable fast and accurate chemical models within the scope of their training 39,[41][42][43][44] . ...

Modern semiempirical electronic structure methods have considerable promise in drug discovery as universal "force fields" that can reliably model biological and drug-like molecules. Herein, we compare the performance of several NDDO-based semiempirical (MNDO/d, AM1, PM6 and ODM2), density-functional tight-binding based (DFTB3, GFN1-xTB and GFN2-xTB) models with pure machine learning potentials (ANI-1x and ANI-2x) and hybrid quantum mechanical/machine learning potentials (AIQM1 and QDπ) for a wide range of data computed at a consistent ωB97X/6-31G* level of theory (as in the ANI-1x database). This data includes conformational energies, intermolecular interactions, tautomers, and protonation states. Additional comparisons are made to a set of natural and synthetic nucleic acids from the artificially expanded genetic information system (AEGIS). This dataset has important implications in the design of new biotechnology and therapeutics. Finally, weexamine acid/base chemistry relevant for RNA cleavage reactions catalyzed by small nucleolytic ribozymes and ribonucleases. Overall, the recently developed QDπ model performs exceptionally well across all datasets, having especially high accuracy for tautomers and protonation states relevant to drug discovery.

... In contrast, the development of new QM/MM methods has been relatively slow due to the computer-intensive nature of QM algorithms, while enzymatic functions require extensive sampling of enzyme conformations. However, in recent years, the development of machine learning approaches [101][102][103][104][105] and large-scale high-performance computers utilizing both many-core CPU and GPU architectures has revived efforts to develop fast QM/MM algorithms for rapid prediction of enzyme activity. Overall, the field faces a formidable challenge, and the dissection and understanding of allosteric networks can only be accomplished with massive developments of high-throughput computational and experimental approaches. ...

Biological life depends on motion, and this manifests itself in proteins that display motion over a formidable range of time scales spanning from femtoseconds vibrations of atoms at enzymatic transition states, all the way to slow domain motions occurring on micro to milliseconds. An outstanding challenge in contemporary biophysics and structural biology is a quantitative understanding of the linkages among protein structure, dynamics, and function. These linkages are becoming increasingly explorable due to conceptual and methodological advances. In this Perspective article, we will point toward future directions of the field of protein dynamics with an emphasis on enzymes. Research questions in the field are becoming increasingly complex such as the mechanistic understanding of high-order interaction networks in allosteric signal propagation through a protein matrix, or the connection between local and collective motions. In analogy to the solution to the “protein folding problem,” we argue that the way forward to understanding these and other important questions lies in the successful integration of experiment and computation, while utilizing the present rapid expansion of sequence and structure space. Looking forward, the future is bright, and we are in a period where we are on the doorstep to, at least in part, comprehend the importance of dynamics for biological function.

... Very recently, several approaches have been proposed to tackle this problem. In the DPRc model by Zeng et al. 9 a Δlearning 10 approach is proposed to correct QM/MM potentials to a higher level of theory. This is done by introducing a correction to the interaction energies that smoothly vanishes for MM atoms farther from the QM region. ...

This work presents a variant of an electrostatic embedding scheme that allows the embedding of arbitrary machine learned potentials trained on molecular systems in vacuo. The scheme is based on physically motivated models of electronic density and polarizability, resulting in a generic model without relying on an exhaustive training set. The scheme only requires in vacuo single point QM calculations to provide training densities and molecular dipolar polarizabilities. As an example, the scheme is applied to create an embedding model for the QM7 data set using Gaussian Process Regression with only 445 reference atomic environments. The model was tested on the SARS-CoV-2 protease complex with PF-00835231, resulting in a predicted embedding energy RMSE of 2 kcal/mol, compared to explicit DFT/MM calculations.

... 17−20 Inclusion of the environment effects poses further challenges, and several works have tried to include these effects, either in excited-state properties 21,22 or by developing ground-state QM/MM potentials. 23 −29 In a previous work 30 we have presented a ML approach to estimate excitonic couplings in LHCs with an accuracy comparable to that of the reference time-dependent density functional theory (TD-DFT) calculations while being orders of magnitude faster. In this work we develop a model for estimating site energies, thus providing a ML estimate for the full exciton Hamiltonian. ...

We propose a machine learning (ML)-based strategy for an inexpensive calculation of excitonic properties of light-harvesting complexes (LHCs). The strategy uses classical molecular dynamics simulations of LHCs in their natural environment in combination with ML prediction of the excitonic Hamiltonian of the embedded aggregate of pigments. The proposed ML model can reproduce the effects of geometrical fluctuations together with those due to electrostatic and polarization interactions between the pigments and the protein. The training is performed on the chlorophylls of the major LHC of plants, but we demonstrate that the model is able to extrapolate well beyond the initial training set. Moreover, the accuracy in predicting the effects of the environment is tested on the simulation of the small changes observed in the absorption spectra of the wild-type and a mutant of a minor LHC.

... Semiempirical QM/MM calculations are several orders of magnitude faster, but their accuracy is largely dependent on the quality of the model parameters, for which systematic error estimation is difficult. Recent approaches apply neural network representations [327][328][329][330] and other ML schemes [328,329,331,332] to reduce the computational cost of the QM calculation in the QM/MM frameworks, either by learning the abminitio-QM/MM PES from semiempirical QM/MM PES (denoted ∆ML, which goes back to work of von Lilienfeld and coworkers [333]) or by directly learning the ab initio PES [301,[304][305][306]321], the electron density, [334] or the wave function [335] of the QM subsystem. For instance, Böselt and coworkers applied a neural network representation of the QM region coupled to a ∆ML approach. ...

Hybrid quantum mechanics/molecular mechanics (QM/MM) hybrid models allow one to address chemical phenomena in complex molecular environments. However, they are tedious to construct and they usually require significant manual preprocessing and expertise. As a result, these models may not be easily transferable to new application areas and the many parameters are not easy to adjust to reference data that are typically scarce. Therefore, it has been difficult to devise automated procedures of controllable accuracy, which makes such type of modelling far from being standardized or of black-box type. Although diverse best-practice protocols have been set up for the construction of individual components of a QM/MM model (e.g., the MM potential, the type of embedding, the choice of the QM region), no automated procedures are available for all steps of the QM/MM model construction. Here, we review the state of the art of QM/MM modeling with a focus on automation. We elaborate on the MM model parametrization, on atom-economical physically-motivated QM region selection, and on embedding schemes that incorporate mutual polarization as critical components of the QM/MM model. In view of the broad scope of the field, we mostly restrict the discussion to methodologies that build de novo models based on first-principles data, on uncertainty quantification, and on error mitigation with a high potential for automation. Ultimately, it is desirable to be able to set up reliable QM/MM models in a fast and efficient automated way without being constrained by some specific chemical or technical limitations.

... In DeepPot-SE, the expression for the atomic contribution E i is a neural network consisting of three hidden layers. The input layer is the molecular descriptor D e R i , which determined by the "environment matrix" e R i , the "embedding" matrix G i , and a reduced dimension embedding matrix G < i [64]. ...

Recently, artificial neural network-based methods for the construction of potential energy surfaces and molecular dynamics (MD) simulations based on them have been increasingly used in the field of theoretical chemistry. The neural network potentials (NNP) strike a good balance between accuracy and computational efficiency relative to quantum chemical calculations and MD simulations based on classical force fields. Thus, NNP is becoming a powerful tool for studying the structure and function of molecules. In this chapter, we introduce the basic theory of NNP. The construction steps and the usage of NNP are also introduced in detail with the MD simulation of methane combustion as an example. We hope that this chapter can help those readers who are new but interested in entering this field.

MTR1 is an in vitro-selected alkyl transferase ribozyme that transfers an alkyl group from O⁶-alkylguanine to N1 of the target adenine in the RNA substrate (A63). The structure of the ribozyme suggested a mechanism in which a cytosine (C10) acts as a general acid to protonate O⁶-alkylguanine N1. Here, we have analyzed the role of the C10 general acid and the A63 nucleophile by atomic mutagenesis and computation. C10 was substituted by n1c and n1c, c5n variants. The n1c variant has an elevated pKa (11.4 as the free nucleotide) and leads to a 10⁴-fold lower activity that is pH-independent. Addition of the second c5n substitution with a lower pKa restored both the rate and pH dependence of alkyl transfer. Quantum mechanical calculations indicate that protonation of O⁶-alkylguanine lowers the barrier to alkyl transfer and that there is a significantly elevated barrier to proton transfer for the n1c single substitution. The calculated pKa values are in good agreement with the apparent values from measured rates. Increasing the pKa of the nucleophile by A63 n7c substitution led to a 6-fold higher rate. The increased reactivity of the nucleophile corresponds to a βnuc of ∼0.5, indicating significant C–N bond formation in the transition state. Taken together, these results are consistent with a two-step mechanism comprising protonation of the O⁶-alkylguanine followed by alkyl transfer.

Nitrate Ester Plasticized Polyether (NEPE) propellants have attracted widespread attention due to their high energy density and excellent low-temperature mechanical properties. However, little is known about the thermal decomposition process...

In the past two decades, machine learning potentials (MLPs) have driven significant developments in chemical, biological, and material sciences. The construction and training of MLPs enable fast and accurate simulations and analysis of thermodynamic and kinetic properties. This review focuses on the application of MLPs to reaction systems with consideration of bond breaking and formation. We review the development of MLP models, primarily with neural network and kernel-based algorithms, and recent applications of reactive MLPs (RMLPs) to systems at different scales. We show how RMLPs are constructed, how they speed up the calculation of reactive dynamics, and how they facilitate the study of reaction trajectories, reaction rates, free energy calculations, and many other calculations. Different data sampling strategies applied in building RMLPs are also discussed with a focus on how to collect structures for rare events and how to further improve their performance with active learning.

The machine learning (ML) method emerges as an efficient and precise surrogate model for high-level electronic structure theory. Its application has been limited to closed chemical systems without considering external potentials from the surrounding environment. To address this limitation and incorporate the influence of external potentials, polarization effects, and long-range interactions between a chemical system and its environment, the first two terms of the Taylor expansion of an electrostatic operator have been used as extra input to the existing ML model to represent the electrostatic environments. However, high-order electrostatic interaction is often essential to account for external potentials from the environment. The existing models based only on invariant features cannot capture significant distribution patterns of the external potentials. Here, we propose a novel ML model that includes high-order terms of the Taylor expansion of an electrostatic operator and uses an equivariant model, which can generate a high-order tensor covariant with rotations as a base model. Therefore, we can use the multipole-expansion equation to derive a useful representation by accounting for polarization and intermolecular interaction. Moreover, to deal with long-range interactions, we follow the same strategy adopted to derive long-range interactions between a target system and its environment media. Our model achieves higher prediction accuracy and transferability among various environment media with these modifications.

Understanding enzyme mechanisms is essential for unraveling the complex molecular machinery of life. In this review, we survey the field of computational enzymology, highlighting key principles governing enzyme mechanisms and discussing ongoing challenges and promising advances. Over the years, computer simulations have become indispensable in the study of enzyme mechanisms, with the integration of experimental and computational exploration now established as a holistic approach to gain deep insights into enzymatic catalysis. Numerous studies have demonstrated the power of computer simulations in characterizing reaction pathways, transition states, substrate selectivity, product distribution, and dynamic conformational changes for various enzymes. Nevertheless, significant challenges remain in investigating the mechanisms of complex multistep reactions, large-scale conformational changes, and allosteric regulation. Beyond mechanistic studies, computational enzyme modeling has emerged as an essential tool for computer-aided enzyme design and the rational discovery of covalent drugs for targeted therapies. Overall, enzyme design/engineering and covalent drug development can greatly benefit from our understanding of the detailed mechanisms of enzymes, such as protein dynamics, entropy contributions, and allostery, as revealed by computational studies. Such a convergence of different research approaches is expected to continue, creating synergies in enzyme research. This review, by outlining the ever-expanding field of enzyme research, aims to provide guidance for future research directions and facilitate new developments in this important and evolving field.

In the last several years, there has been a surge in the development of machine learning potential (MLP) models for describing molecular systems. We are interested in a particular area of this field — the training of system‐specific MLPs for reactive systems — with the goal of using these MLPs to accelerate free energy simulations of chemical and enzyme reactions. To help new members in our labs become familiar with the basic techniques, we have put together a self‐guided Colab tutorial ( https://cc-ats.github.io/mlp_tutorial/ ), which we expect to be also useful to other young researchers in the community. Our tutorial begins with the introduction of simple feedforward neural network (FNN) and kernel‐based (using Gaussian process regression, GPR) models by fitting the two‐dimensional Müller‐Brown potential. Subsequently, two simple descriptors are presented for extracting features of molecular systems: symmetry functions (including the ANI variant) and embedding neural networks (such as DeepPot‐SE). Lastly, these features will be fed into FNN and GPR models to reproduce the energies and forces for the molecular configurations in a Claisen rearrangement reaction.

Machine learning (ML) continues to revolutionize computational chemistry for accelerating predictions and simulations by training on experimental or accurate but expensive quantum mechanical (QM) calculations. Photodynamics simulations require hundreds of trajectories coupled with multiconfigurational QM calculations of excited-state potential energies surfaces that contribute to the prohibitive computational cost at long timescales and complex organic molecules. ML accelerates photodynamics simulations by combining nonadiabatic photodynamics simulations with an ML model trained with high-fidelity QM calculations of energies, forces, and non-adiabatic couplings. This approach has provided time-dependent molecular structural information for understanding photochemical reaction mechanisms of organic reactions in vacuum and complex environments (i.e., explicit solvation). This review focuses on the fundamentals of QM calculations and ML techniques. We, then, discuss the strategies to balance adequate training data and the computational cost of generating these training data. Finally, we demonstrate the power of applying these ML-photodynamics simulations to understand the origin of reactivities and selectivities of organic photochemical reactions, such as cis–trans isomerization, [2 + 2]-cycloaddition, 4π-electrostatic ring-closing, and hydrogen roaming mechanism.

The inherent discontinuity and unique dimensional attributes of nanomaterial surfaces and interfaces bestow them with various exceptional properties. These properties, however, also introduce difficulties for both experimental and computational studies. The advent of machine learning interatomic potential (MLIP) addresses some of the limitations associated with empirical force fields, presenting a valuable avenue for accurate simulations of these surfaces/interfaces of nanomaterials. Central to this approach is the idea of capturing the relationship between system configuration and potential energy, leveraging the proficiency of machine learning (ML) to precisely approximate high‐dimensional functions. This review offers an in‐depth examination of MLIP principles and their execution and elaborates on their applications in the realm of nanomaterial surface and interface systems. The prevailing challenges faced by this potent methodology are also discussed.

Free energy differences (ΔF) are essential to quantitative characterization and understanding of chemical and biological processes. Their direct estimation with an accurate quantum mechanical potential is of great interest and yet impractical due to high computational cost and incompatibility with typical alchemical free energy protocols. One promising solution is the multilevel free energy simulation in which the estimate of ΔF at an inexpensive low level of theory is combined with the correction toward a higher level of theory. The poor configurational overlap generally expected between the two levels of theory, however, presents a major challenge. We overcome this challenge by using a deep neural network model and enhanced sampling simulations. An adversarial autoencoder is used to identify a low-dimensional (latent) space that compactly represents the degrees of freedom that encode the distinct distributions at the two levels of theory. Enhanced sampling in this latent space is then used to drive the sampling of configurations that predominantly contribute to the free energy correction. Results for both gas phase and condensed phase systems demonstrate that this data-driven approach offers high accuracy and efficiency with great potential for scalability to complex systems.

In silico investigations of enzymatic reactions and chemical reactions in condensed phases often suffer from formidable computational costs due to a large number of degrees of freedom and enormous important volume in phase space. Usually, accuracy must be compromised to trade for efficiency by lowering the reliability of the Hamiltonians employed or reducing the sampling time. Reference-potential methods (RPMs) offer an alternative approach to reaching high accuracy of simulation without much loss of efficiency. In this Perspective, we summarize the idea of RPMs and showcase some recent applications. Most importantly, the pitfalls of these methods are also discussed, and remedies to these pitfalls are presented.

We present a comparative study that evaluates the performance of a machine learning potential (ANI-2x), a conventional force field (GAFF), and an optimally tuned GAFF-like force field in the modeling of a set of 10 γ-fluorohydrins that exhibit a complex interplay between intra- and intermolecular interactions in determining conformer stability. To benchmark the performance of each molecular model, we evaluated their energetic, geometric, and sampling accuracies relative to quantum-mechanical data. This benchmark involved conformational analysis both in the gas phase and chloroform solution. We also assessed the performance of the aforementioned molecular models in estimating nuclear spin-spin coupling constants by comparing their predictions to experimental data available in chloroform. The results and discussion presented in this study demonstrate that ANI-2x tends to predict stronger-than-expected hydrogen bonding and overstabilize global minima and shows problems related to inadequate description of dispersion interactions. Furthermore, while ANI-2x is a viable model for modeling in the gas phase, conventional force fields still play an important role, especially for condensed-phase simulations. Overall, this study highlights the strengths and weaknesses of each model, providing guidelines for the use and future development of force fields and machine learning potentials.

Multiscale modeling has a long history of use in structural biology, as computational biologists strive to overcome the time- and length-scale limits of atomistic molecular dynamics. Contemporary machine learning techniques, such as deep learning, have promoted advances in virtually every field of science and engineering and are revitalizing the traditional notions of multiscale modeling. Deep learning has found success in various approaches for distilling information from fine-scale models, such as building surrogate models and guiding the development of coarse-grained potentials. However, perhaps its most powerful use in multiscale modeling is in defining latent spaces that enable efficient exploration of conformational space. This confluence of machine learning and multiscale simulation with modern high-performance computing promises a new era of discovery and innovation in structural biology.

Advances in machine learned interatomic potentials (MLIPs), such as those using neural networks, have resulted in short-range models that can infer interaction energies with near ab initio accuracy and orders of magnitude reduced computational cost. For many atom systems, including macromolecules, biomolecules, and condensed matter, model accuracy can become reliant on the description of short- and long-range physical interactions. The latter terms can be difficult to incorporate into an MLIP framework. Recent research has produced numerous models with considerations for nonlocal electrostatic and dispersion interactions, leading to a large range of applications that can be addressed using MLIPs. In light of this, we present a Perspective focused on key methodologies and models being used where the presence of nonlocal physics and chemistry are crucial for describing system properties. The strategies covered include MLIPs augmented with dispersion corrections, electrostatics calculated with charges predicted from atomic environment descriptors, the use of self-consistency and message passing iterations to propagated nonlocal system information, and charges obtained via equilibration schemes. We aim to provide a pointed discussion to support the development of machine learning-based interatomic potentials for systems where contributions from only nearsighted terms are deficient.

Quantum mechanics/molecular mechanics (QM/MM) hybrid models allow one to address chemical phenomena in complex molecular environments. Whereas this modeling approach can cope with a large system size at moderate computational costs, the models are often tedious to construct and require manual preprocessing and expertise. As a result, transferability to new application areas can be limited and the many parameters are not easy to adjust to reference data that are typically scarce. Therefore, it is desirable to devise automated procedures of controllable accuracy, which enables such modeling in a standardized and black‐box‐type manner. Although diverse best‐practice protocols have been set up for the construction of individual components of a QM/MM model (e.g., the MM potential, the type of embedding, the choice of the QM region), automated procedures that reconcile all steps of the QM/MM model construction are still rare. Here, we review the state of the art of QM/MM modeling with a focus on automation. We elaborate on MM model parametrization, on atom‐economical physically‐motivated QM region selection, and on embedding schemes that incorporate mutual polarization as critical components of the QM/MM model. In view of the broad scope of the field, we mostly restrict the discussion to methodologies that build de novo models based on first‐principles data, on uncertainty quantification, and on error mitigation with a high potential for automation. Ultimately, it is desirable to be able to set up reliable QM/MM models in a fast and efficient automated way without being constrained by specific chemical or technical limitations. This article is categorized under: Electronic Structure Theory > Combined QM/MM Methods This review discusses modern approaches towards generally applicable QM/MM models built from first‐principles data with a focus on automation and uncertainty quantification.

A powerful tool to study the mechanism of reactions in solutions or enzymes is to perform the ab initio quantum mechanical/molecular mechanical (QM/MM) molecular dynamics (MD) simulations. However, the computational cost is too high due to the explicit electronic structure calculations at every time step of the simulation. A neural network (NN) method can accelerate the QM/MM-MD simulations, but it has long been a problem to accurately describe the QM/MM electrostatic coupling by NN in the electrostatic embedding (EE) scheme. In this work, we developed a new method to accelerate QM/MM calculations in the mechanic embedding (ME) scheme. The potentials and partial point charges of QM atoms are first learned in vacuo by the embedded atom neural networks (EANN) approach. MD simulations are then performed on this EANN/MM potential energy surface (PES) to obtain free energy (FE) profiles for reactions, in which the QM/MM electrostatic coupling is treated in the mechanic embedding (ME) scheme. Finally, a weighted thermodynamic perturbation (wTP) corrects the FE profiles in the ME scheme to the EE scheme. For two reactions in water and one in methanol, our simulations reproduced the B3LYP/MM free energy profiles within 0.5 kcal/mol with a speed-up of 30-60-fold. The results show that the strategy of combining EANN potential in the ME scheme with the wTP correction is efficient and reliable for chemical reaction simulations in liquid. Another advantage of our method is that the QM PES is independent of the MM subsystem, so it can be applied to various MM environments as demonstrated by an SN2 reaction studied in water and methanol individually, which used the same EANN PES. The free energy profiles are in excellent accordance with the results obtained from B3LYP/MM-MD simulations. In future, this method will be applied to the reactions of enzymes and their variants.

Recent advances in data science are impacting the development of classical force fields. Here we review some ideas and techniques from data science that have been used in force field development, including database construction, atom typing, and machine learning potentials. We highlight how new tools such as active learning and automatic differentiation are facilitating the generation of target data and the direct fitting with macroscopic observables. Philosophical changes on how force field models should be built and used are also discussed. It's inspiring that more accurate biomolecular force fields can be developed with the aid of data science techniques.

In combined quantum mechanical and molecular mechanical (QM/MM) free energy simulations, how to synthesize the accuracy of ab initio (AI) methods with the speed of semiempirical (SE) methods for a cost-effective QM treatment remains a long-standing challenge. In this work, we present a machine-learning-facilitated method for obtaining AI/MM-quality free energy profiles through efficient SE/MM simulations. In particular, we use Gaussian process regression (GPR) to learn the energy and force corrections needed for SE/MM to match with AI/MM results during molecular dynamics simulations. Force matching is enabled in our model by including energy derivatives into the observational targets through the extended-kernel formalism. We demonstrate the effectiveness of this method on the solution-phase SN2 Menshutkin reaction using AM1/MM and B3LYP/6-31+G(d,p)/MM as the base and target levels, respectively. Trained on only 80 configurations sampled along the minimum free energy path (MFEP), the resulting GPR model reduces the average energy error in AM1/MM from 18.2 to 5.8 kcal mol-1 for the 4000-sample testing set with the average force error on the QM atoms decreased from 14.6 to 3.7 kcal mol-1 Å-1. Free energy sampling with the GPR corrections applied (AM1-GPR/MM) produces a free energy barrier of 14.4 kcal mol-1 and a reaction free energy of -34.1 kcal mol-1, in closer agreement with the AI/MM benchmarks and experimental results.

Predictive molecular simulations require fast, accurate and reactive interatomic potentials. Machine learning offers a promising approach to construct such potentials by fitting energies and forces to high-level quantum-mechanical data, but doing so typically requires considerable human intervention and data volume. Here we show that, by leveraging hierarchical and active learning, accurate Gaussian Approximation Potential (GAP) models can be developed for diverse chemical systems in an autonomous manner, requiring only hundreds to a few thousand energy and gradient evaluations on a reference potential-energy surface. The approach uses separate intra- and inter-molecular fits and employs a prospective error metric to assess the accuracy of the potentials. We demonstrate applications to a range of molecular systems with relevance to computational organic chemistry: ranging from bulk solvents, a solvated metal ion and a metallocage onwards to chemical reactivity, including a bifurcating Diels-Alder reaction in the gas phase and non-equilibrium dynamics (a model SN2 reaction) in explicit solvent. The method provides a route to routinely generating machine-learned force fields for reactive molecular systems.

Quantum mechanics/molecular mechanics (QM/MM) molecular dynamics (MD) simulations have been developed to simulate molecular systems, where an explicit description of changes in the electronic structure is necessary. However, QM/MM MD simulations are computationally expensive compared to fully classical simulations as all valence electrons are treated explicitly and a self-consistent field (SCF) procedure is required. Recently, approaches have been proposed to replace the QM description with machine-learned (ML) models. However, condensed-phase systems pose a challenge for these approaches due to long-range interactions. Here, we establish a workflow, which incorporates the MM environment as an element type in a high-dimensional neural network potential (HDNNP). The fitted HDNNP describes the potential-energy surface of the QM particles with an electrostatic embedding scheme. Thus, the MM particles feel a force from the polarized QM particles. To achieve chemical accuracy, we find that even simple systems require models with a strong gradient regularization, a large number of data points, and a substantial number of parameters. To address this issue, we extend our approach to a Δ-learning scheme, where the ML model learns the difference between a reference method (density functional theory (DFT)) and a cheaper semiempirical method (density functional tight binding (DFTB)). We show that such a scheme reaches the accuracy of the DFT reference method while requiring significantly less parameters. Furthermore, the Δ-learning scheme is capable of correctly incorporating long-range interactions within a cutoff of 1.4 nm. It is validated by performing MD simulations of retinoic acid in water and the interaction between S-adenoslymethioniat and cytosine in water. The presented results indicate that Δ-learning is a promising approach for (QM)ML/MM MD simulations of condensed-phase systems.

Combustion is a complex chemical system which involves thousands of chemical reactions and generates hundreds of molecular species and radicals during the process. In this work, a neural network-based molecular dynamics (MD) simulation is carried out to simulate the benchmark combustion of methane. During MD simulation, detailed reaction processes leading to the creation of specific molecular species including various intermediate radicals and the products are intimately revealed and characterized. Overall, a total of 798 different chemical reactions were recorded and some new chemical reaction pathways were discovered. We believe that the present work heralds the dawn of a new era in which neural network-based reactive MD simulation can be practically applied to simulating important complex reaction systems at ab initio level, which provides atomic-level understanding of chemical reaction processes as well as discovery of new reaction pathways at an unprecedented level of detail beyond what laboratory experiments could accomplish.

DFTB+ is a versatile community developed open source software package offering fast and efficient methods for carrying out atomistic quantum mechanical simulations. By implementing various methods approximating density functional theory (DFT), such as the density functional based tight binding (DFTB) and the extended tight binding method, it enables simulations of large systems and long timescales with reasonable accuracy while being considerably faster for typical simulations than the respective ab initio methods. Based on the DFTB framework, it additionally offers approximated versions of various DFT extensions including hybrid functionals, time dependent formalism for treating excited systems, electron transport using non-equilibrium Green’s functions, and many more. DFTB+ can be used as a user-friendly standalone application in addition to being embedded into other software packages as a library or acting as a calculation-server accessed by socket communication. We give an overview of the recently developed capabilities of the DFTB+ code, demonstrating with a few use case examples, discuss the strengths and weaknesses of the various features, and also discuss on-going developments and possible future perspectives.

TiO2 is a widely used photocatalyst in science and technology and its interface with water is important in fields ranging from geochemistry to biomedicine. Yet, it is still unclear whether wateradsorbs in molecular or dissociated form on TiO2 even for the case of well-defined crystalline surfaces. To address this issue, we simulated the TiO2 -water interface using molecular dynamics with an ab initio-based deep neural network potential. Our simulations show a dynamical equilibrium of molecular and dissociative adsorption of water on TiO2 . Water dissociates through a solvent-assisted concerted proton transfer to form a pair of short-lived hydroxyl groups on the TiO2 surface. Molecular adsorption of water is ∆F = 7.5 ± 0.9 kJ/mol lower in free energy than the dissociative adsorption, giving rise to a 6 ± 0.5% equilibrium water dissociation fraction at room temperature. Due to the relevance of surface hydroxyl groups to the surface chemistry of TiO2, our model might be key to understanding phenomena ranging from surface functionalization to photocatalytic mechanisms.

The Varkud satellite ribozyme catalyses site-specific RNA cleavage and ligation, and serves as an important model system to understand RNA catalysis. Here, we combine stereospecific phosphorothioate substitution, precision nucleobase mutation and linear free-energy relationship measurements with molecular dynamics, molecular solvation theory and ab initio quantum mechanical/molecular mechanical free-energy simulations to gain insight into the catalysis. Through this confluence of theory and experiment, we unify the existing body of structural and functional data to unveil the catalytic mechanism in unprecedented detail, including the degree of proton transfer in the transition state. Further, we provide evidence for a critical Mg2+ in the active site that interacts with the scissile phosphate and anchors the general base guanine in position for nucleophile activation. This novel role for Mg2+ adds to the diversity of known catalytic RNA strategies and unifies functional features observed in the Varkud satellite, hairpin and hammerhead ribozyme classes.

Atomic neural networks (ANNs) constitute a class of machine learning methods for predicting potential energy surfaces and physico-chemical properties of molecules and materials. Despite many successes, developing interpretable ANN architectures and implementing existing ones efficiently are still challenging. This calls for reliable, general-purpose and open-source codes. Here, we present a python library named PiNN as a solution toward this goal. In PiNN, we designed a new interpretable and high-performing graph convolutional neural network variant, PiNet, as well as implemented the established Behler-Parrinello high-dimensional neural network. These implementations were tested using datasets of isolated small molecules, crystalline materials, liquid water and an aqueous alkaline electrolyte. PiNN comes with a visualizer called PiNNBoard to extract chemical insight ``learned'' by ANNs, provides analytical stress tensor calculations and interfaces to both the Atomic Simulation Environment and a development version of the Amsterdam Modeling Suite. Moreover, PiNN is highly modularized which makes it useful not only as a standalone package but also as a chain of tools to develop and to implement novel ANNs. The code is distributed under a permissive BSD license and is freely accessible at \href{https://github.com/Teoroo-CMC/PiNN/}{https://github.com/Teoroo-CMC/PiNN/} with full documentation and tutorials.

We develop an L-platform/L-scaffold framework we hypothesize may serve as a blueprint to facilitate site-specific RNA-cleaving nucleic acid enzyme design. Building on the L-platform motif originally described by Suslov and coworkers, we identify new critical scaffolding elements required to anchor a conserved general base guanine ("L-anchor") and bind functionally important metal ions at the active site ("L-pocket"). Molecular simulations, together with a broad range of experimental structural and functional data, connect the L-platform/L-scaffold elements to necessary and sufficient conditions for catalytic activity. We demonstrate that the L-platform/L-scaffold framework is common to 5 of the 9 currently known naturally occurring ribozyme classes (Twr, HPr, VSr, HHr, Psr), and intriguingly from a design perspective, the framework also appears in an artificially engineered DNAzyme (8-17dz). The flexibility of the L-platform/L-scaffold framework is illustrated on these systems, highlighting modularity and trends in the variety of known general acid moieties that are supported. These trends give rise to two distinct catalytic paradigms, building on the classifications proposed by Wilson and coworkers and named for the implicated general base and acid. The "G+A" paradigm (Twr, HPr, VSr) exclusively utilizes nucleobase residues for chemistry, and the "G+M+" paradigm (HHr, 8-17dz, Psr) involves structuring of the "L-pocket" metal ion binding site for recruitment of a divalent metal ion that plays an active role in the chemical steps of the reaction. Finally, the modularity of the L-platform/L-scaffold framework is illustrated in the VS ribozyme where the "L-pocket" assumes the functional role of the "L-anchor" element, highlighting a distinct mechanism, but one that is functionally linked with the hammerhead ribozyme.

We perform molecular dynamics simulations, based on recent crystallographic data, on the 8-17 DNAzyme at four states along the reaction pathway to determine the dynamical ensemble for the active state and transition state mimic in solution. A striking finding is the diverse roles played by Na+ and Pb2+ ions in the electrostatically strained active site that impact all four fundamental catalytic strategies, and share commonality with some features recently inferred for naturally occurring hammerhead and pistol ribozymes. The active site Pb2+ ion helps to stabilize in-line nucleophilic attack, provides direct electrostatic transition state stabilization, and facilitates leaving group departure. A conserved guanine residue is positioned to act as the general base, and is assisted by a bridging Na+ ion that tunes the pKa and facilitates in-line fitness. The present work provides insight into how DNA molecules are able to solve the RNA-cleavage problem, and establishes functional relationships between the mechanism of these engineered DNA enzymes with their naturally evolved RNA counterparts. This adds valuable information to our growing body of knowledge on general mechanisms of phosphoryl transfer reactions catalyzed by RNA, proteins and DNA.

The pistol ribozyme (Psr) is among the most recently discovered RNA enzymes and has been the subject of experiments aimed at elucidating the mechanism. Recent biochemical studies have revealed exciting clues about catalytic interactions in the active site not apparent from available crystallographic data. The present work unifies the interpretation of the existing body of structural and functional data on Psr by providing a dynamical model for the catalytically active state in solution from molecular simulation. Our results suggest that a catalytic Mg2+ ion makes inner-sphere contact with G33:N7 and outer-sphere coordination to the pro-RP of the scissile phosphate, promoting electrostatic stabilization of the dianionic transition state and neutralization of the developing charge of the leaving group through a metal-coordinated water molecule that is made more acidic by a hydrogen bond donated from the 2'OH of P32. This model is consistent with experimental activity-pH and mutagenesis data, including sensitivity to G33(7cG) and phosphorothioate substitution/metal ion rescue. The model suggests several experimentally testable predictions, including the response of cleavage activity to mutations at G42 and P32 positions in the ribozyme, and thio substitutions of the substrate in the presence of different divalent metal ions. Further, the model identifies striking similarities of Psr to the hammerhead ribozyme (HHr), including similar global fold, organization of secondary structure around an active site three-way junction, catalytic metal ion binding mode, and guanine general base. However, the specific binding mode and role of the Mg2+ ion, as well as a conserved 2'-OH in the active site, are interrelated but subtly different between the ribozymes.

Computational modeling of chemical and biological systems at atomic resolution is a crucial tool in the chemist's toolset. The use of computer simulations requires a balance between cost and accuracy: quantum-mechanical methods provide high accuracy but are computationally expensive and scale poorly to large systems, while classical force fields are cheap and scalable, but lack transferability to new systems. Machine learning can be used to achieve the best of both approaches. Here we train a general-purpose neural network potential (ANI-1ccx) that approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. This is achieved by training a network to DFT data then using transfer learning techniques to retrain on a dataset of gold standard QM calculations (CCSD(T)/CBS) that optimally spans chemical space. The resulting potential is broadly applicable to materials science, biology, and chemistry, and billions of times faster than CCSD(T)/CBS calculations.

In recent years, machine learning (ML) methods have become increasingly popular in computational chemistry. After being trained on appropriate ab initio reference data, these methods allow to accurately predict the properties of chemical systems, circumventing the need for explicitly solving the electronic Schrödinger equation. Because of their computational efficiency and scalability to large datasets, deep neural networks (DNNs) are a particularly promising ML algorithm for chemical applications. This work introduces PhysNet, a DNN architecture designed for predicting energies, forces and dipole moments of chemical systems. PhysNet achieves state-of-the-art performance on the QM9, MD17 and ISO17 benchmarks. Further, two new datasets are generated in order to probe the performance of ML models for describing chemical reactions, long-range interactions, and condensed phase systems. It is shown that explicitly including electrostatics in energy predictions is crucial for a qualitatively correct description of the asymptotic regions of a potential energy surface (PES). PhysNet models trained on a systematically constructed set of small peptide fragments (at most eight heavy atoms) are able to generalize to considerably larger proteins like deca-alanine (Ala10): The optimized geometry of helical Ala10 predicted by PhysNet is virtually identical to ab initio results (RMSD = 0.21 Å). By running unbiased molecular dynamics (MD) simulations of Ala10 on the PhysNet-PES in gas phase, it is found that instead of a helical structure, Ala10 folds into a "wreath-shaped" configuration, which is more stable than the helical form by 0.46 kcal mol⁻¹ according to the reference ab initio calculations.

An active learning procedure called deep potential generator (DP-GEN) is proposed for the construction of accurate and transferable machine learning-based models of the potential energy surface (PES) for the molecular modeling of materials. This procedure consists of three main components: exploration, generation of accurate reference data, and training. Application to the sample systems of Al, Mg, and Al-Mg alloys demonstrates that DP-GEN can produce uniformly accurate PES models with a minimal number of reference data.

In this Viewpoint, we discuss the current progress in applications of machine learning (ML) and artificial intelligence (AI) to meet the challenges of computational drug discovery. We identify several areas where existing methods have the potential to accelerate pharmaceutical research and disrupt more traditional approaches.

Deep learning has led to a paradigm shift in artificial intelligence, including web, text, and image search, speech recognition, as well as bioinformatics, with growing impact in chemical physics. Machine learning, in general, and deep learning, in particular, are ideally suitable for representing quantum-mechanical interactions, enabling us to model nonlinear potential-energy surfaces or enhancing the exploration of chemical compound space. Here we present the deep learning architecture SchNet that is specifically designed to model atomistic systems by making use of continuous-filter convolutional layers. We demonstrate the capabilities of SchNet by accurately predicting a range of properties across chemical space for molecules and materials, where our model learns chemically plausible embeddings of atom types across the periodic table. Finally, we employ SchNet to predict potential-energy surfaces and energy-conserving force fields for molecular dynamics simulations of small molecules and perform an exemplary study on the quantum-mechanical properties of C20-fullerene that would have been infeasible with regular ab initio molecular dynamics.

We introduce a representation of any atom in any chemical environment for the automatized generation of universal kernel ridge regression-based quantum machine learning (QML) models of electronic properties, trained throughout chemical compound space. The representation is based on Gaussian distribution functions, scaled by power laws and explicitly accounting for structural as well as elemental degrees of freedom. The elemental components help us to lower the QML model’s learning curve, and, through interpolation across the periodic table, even enable “alchemical extrapolation” to covalent bonding between elements not part of training. This point is demonstrated for the prediction of covalent binding in single, double, and triple bonds among main-group elements as well as for atomization energies in organic molecules. We present numerical evidence that resulting QML energy models, after training on a few thousand random training instances, reach chemical accuracy for out-of-sample compounds. Compound datasets studied include thousands of structurally and compositionally diverse organic molecules, non-covalently bonded protein side-chains, (H2O)40-clusters, and crystalline solids. Learning curves for QML models also indicate competitive predictive power for various other electronic ground state properties of organic molecules, calculated with hybrid density functional theory, including polarizability, heat-capacity, HOMO-LUMO eigenvalues and gap, zero point vibrational energy, dipole moment, and highest vibrational fundamental frequency.

We use HIP-NN, a neural network architecture that excels at predicting molecular energies, to predict atomic charges. The charge predictions are accurate over a wide range of molecules (both small and large) and for a diverse set of charge assignment schemes. To demonstrate the power of charge prediction on non-equilibrium geometries, we use HIP-NN to generate IR spectra from dynamical trajectories on a variety of molecules. The results are in good agreement with reference IR spectra produced by traditional theoretical methods. Critically, for this application, HIP-NN charge predictions are about 104 times faster than direct DFT charge calculations. Thus, ML provides a pathway to greatly increase the range of feasible simulations while retaining quantum-level accuracy. In summary, our results provide further evidence that machine learning can replicate high-level quantum calculations at a tiny fraction of the computational cost.

Recent developments in many-body potential energy representation via deep learning have brought new hopes to addressing the accuracy-versus-efficiency dilemma in molecular simulations. Here we describe DeePMD-kit, a package written in Python/C++ that has been designed to minimize the effort required to build deep learning based representation of potential energy and force field and to perform molecular dynamics. Potential applications of DeePMD-kit span from finite molecules to extended systems and from metallic systems to chemically bonded systems. DeePMD-kit is interfaced with TensorFlow, one of the most popular deep learning frameworks, making the training process highly automatic and efficient. On the other end, DeePMD-kit is interfaced with high-performance classical molecular dynamics and quantum (path-integral) molecular dynamics packages, i.e., LAMMPS and the i-PI, respectively. Thus, upon training, the potential energy and force field models can be used to perform efficient molecular simulations for different purposes. As an example of the many potential applications of the package, we use DeePMD-kit to learn the interatomic potential energy and forces of a water model using data obtained from density functional theory. We demonstrate that the resulted molecular dynamics model reproduces accurately the structural information contained in the original model.

This paper investigates the problem of adaptive fault-tolerant control for a class of nonlinear parametric strict-feedback systems with multiple unknown control directions. Multiple sensor faults are first considered such that all real state variables are unavailable. Then, a constructive design method for the problem is set up by exploiting a parameter separation and regrouping technique. To circumvent the main obstacle caused by the coupling effects of multiple unknown control directions and sensor faults, a region-dependent segmentation analysis method is proposed. It is proven that the closed-loop system is globally exponentially stable. Simulation results are presented to illustrate the effectiveness of the proposed scheme.

Molecular dynamics (MD) simulations employing ab initio quantum mechanical and molecular mechanical (ai-QM/MM) potentials are considered to be the state of the art, but the high computational cost associated with the ai-QM calculations remains a theoretical challenge for their routine application. Here, we present a modified protocol of the multiple time step (MTS) method for accelerating ai-QM/MM MD simulations of condensed-phase reactions. Within a previous MTS protocol [Nam J. Chem. Theory Comput. 2014, 10, 4175], reference forces are evaluated using a low-level (semiempirical QM/MM) Hamiltonian and employed at inner time steps to propagate the nuclear motions. Correction forces, which arise from the force differences between high-level (ai-QM/MM) and low-level Hamiltonians, are applied at outer time steps, where the MTS algorithm allows the time-reversible integration of the correction forces. To increase the outer step size, which is bound by the highest-frequency component in the correction forces, the semiempirical QM Hamiltonian is recalibrated in this work to minimize the magnitude of the correction forces. The remaining high-frequency modes, which are mainly bond stretches involving hydrogen atoms, are then removed from the correction forces. When combined with a Langevin or SIN(R) thermostat, the modified MTS-QM/MM scheme remains robust with an up to 8 (with Langevin) or 10 fs (with SIN(R)) outer time step (with 1 fs inner time steps) for the chorismate mutase system. This leads to an over 5-fold speedup over standard ai-QM/MM simulations, without sacrificing the accuracy in the predicted free energy profile of the reaction.

We redevelop the variational free energy profile (vFEP) method using a cardinal B-spline basis to extend the method for analyzing free energy surfaces (FESs) involving three or more reaction coordinates. We also implemented software for evaluating high-dimensional profiles based on the multistate Bennett acceptance ratio (MBAR) method which constructs an unbiased probability density from global reweighting of the observed samples. The MBAR method takes advantage of a fast algorithm for solving the unbinned weighted histogram (UWHAM)/MBAR equations which replaces the solution of simultaneous equations with a nonlinear optimization of a convex function. We make use of cardinal B-splines and multiquadric radial basis functions to obtain smooth, differentiable MBAR profiles in arbitrary high dimensions. The cardinal B-spline vFEP and MBAR methods are compared using three example systems that examine 1D, 2D, and 3D profiles. Both methods are found to be useful and produce nearly indistinguishable results. The vFEP method is found to be 150 times faster than MBAR when applied to periodic 2D profiles, but the MBAR method is 4.5 times faster than vFEP when evaluating unbounded 3D profiles. In agreement with previous comparisons, we find the vFEP method produces superior FESs when the overlap between umbrella window simulations decreases. Finally, the associative reaction mechanism of hammerhead ribozyme is characterized using 3D, 4D, and 6D profiles, and the higher-dimensional profiles are found to have smaller reaction barriers by as much as 1.5 kcal/mol. The methods presented here have been implemented into the FE-ToolKit software package along with new methods for network-wide free energy analysis in drug discovery.

On the surface
The uptake and hydrolysis of N 2 O 5 from the atmosphere by aqueous aerosols was long thought to occur by solvation and subsequent hydrolysis in the bulk of the aerosol. However, this mechanistic hypothesis was unverifiable because of the fast reaction kinetics. Galib et al. used molecular simulations to show instead that the mechanism is the inverse: Interfacial hydrolysis is followed by solvation into the interior. Their reactive uptake model is consistent with some existing experimental observations.
Science , this issue p. 921

We explore the role of long-range interactions in atomistic machine-learning models by analyzing the effects on fitting accuracy, isolated cluster properties, and bulk thermodynamic properties. Such models have become increasingly popular in molecular simulations given their ability to learn highly complex and multi-dimensional interactions within a local environment; however, many of them fundamentally lack a description of explicit long-range interactions. In order to provide a well-defined benchmark system with precisely known pairwise interactions, we chose as the reference model a flexible version of the Extended Simple Point Charge (SPC/E) water model. Our analysis shows that while local representations are sufficient for predictions of the condensed liquid phase, the short-range nature of machine-learning models falls short in representing cluster and vapor phase properties. These findings provide an improved understanding of the role of long-range interactions in machine learning models and the regimes where they are necessary.

In a previous work [Pan et al., Molecules 23, 2500 (2018)], a charge projection scheme was reported, where outer molecular mechanical (MM) charges [>10 Å from the quantum mechanical (QM) region] were projected onto the electrostatic potential (ESP) grid of the QM region to accurately and efficiently capture long-range electrostatics in ab initio QM/MM calculations. Here, a further simplification to the model is proposed, where the outer MM charges are projected onto inner MM atom positions (instead of ESP grid positions). This enables a representation of the long-range MM electrostatic potential via augmentary charges (AC) on inner MM atoms. Combined with the long-range electrostatic correction function from Cisneros et al. [J. Chem. Phys. 143, 044103 (2015)] to smoothly switch between inner and outer MM regions, this new QM/MM-AC electrostatic model yields accurate and continuous ab initio QM/MM electrostatic energies with a 10 Å cutoff between inner and outer MM regions. This model enables efficient QM/MM cluster calculations with a large number of MM atoms as well as QM/MM calculations with periodic boundary conditions.

QM/MM simulations have become an indispensable tool in many chemical and biochemical investigations. Considering the tremendous degree of success, including recognition by a 2013 Nobel Prize in Chemistry, are there still "burning challenges" in QM/MM methods, especially for biomolecular systems? In this short Perspective, we discuss several issues that we believe greatly impact the robustness and quantitative applicability of QM/MM simulations to many, if not all, biomolecules. We highlight these issues with observations and relevant advances from recent studies in our group and others in the field. Despite such limited scope, we hope the discussions are of general interest and will stimulate additional developments that help push the field forward in meaningful directions.

We present the GPU version of DeePMD-kit, which, upon training a deep neural network model using ab initio data, can drive extremely large-scale molecular dynamics (MD) simulation with ab initio accuracy. Our tests show that for a water system of 12,582,912 atoms, the GPU version can be 7 times faster than the CPU version under the same power consumption. The code can scale up to the entire Summit supercomputer. For a copper system of 113,246,208 atoms, the code can perform one nanosecond MD simulation per day, reaching a peak performance of 86 PFLOPS (43% of the peak). Such unprecedented ability to perform MD simulation with ab initio accuracy opens up the possibility of studying many important issues in materials and molecules, such as heterogeneous catalysis, electrochemical cells, irradiation damage, crack propagation, and biochemical reactions.
Program summary
Program Title: DeePMD-kit
CPC Library link to program files: https://doi.org/10.17632/phyn4kgsfx.1
Developer’s repository link: https://doi.org/10.5281/zenodo.3961106
Licensing provisions: LGPL
Programming language: C++/Python/CUDA
Journal reference of previous version: Comput. Phys. Commun. 228 (2018), 178–184.
Does the new version supersede the previous version?: Yes.
Reasons for the new version: Parallelize and optimize the DeePMD-kit for modern high performance computers.
Summary of revisions: The optimized DeePMD-kit is capable of computing 100 million atoms molecular dynamics with ab initio accuracy, achieving 86 PFLOPS in double precision.
Nature of problem: Modeling the many-body atomic interactions by deep neural network models. Running molecular dynamics simulations with the models.
Solution method: The Deep Potential for Molecular Dynamics (DeePMD) method is implemented based on the deep learning framework TensorFlow. Standard and customized TensorFlow operators are optimized for GPU. Massively parallel molecular dynamics simulations with DeePMD models on high performance computers are supported in the new version.

We propose a general machine learning-based framework for building an accurate and widely applicable energy functional within the framework of generalized Kohn-Sham density functional theory. To this end, we develop a way of training self-consistent models that are capable of taking large datasets from different systems and different kinds of labels. We demonstrate that the functional that results from this training procedure gives chemically accurate predictions on energy, force, dipole, and electron density for a large class of molecules. It can be continuously improved when more and more data are available.

Reactive molecular dynamics (MD) simulation is a powerful tool to study the reaction mechanism of complex chemical systems. Central to the method is the potential energy surface (PES) that can describe the breaking and formation of chemical bonds. The development of both accurate and efficient PES has attracted significant effort in the past 2 decades. A recently developed deep potential (DP) model has the promise to bring ab initio accuracy to large-scale reactive MD simulations. However, for complex chemical reaction processes like pyrolysis, it remains challenging to generate reliable DP models with an optimal training data set. In this work, a data set construction scheme for such a purpose was established. The employment of a concurrent learning algorithm allows us to maximize the exploration of the chemical space while minimizing the redundancy of the data set. This greatly reduces the cost of computational resources required for ab initio calculations. Based on this method, we constructed a data set for the pyrolysis of n-dodecane, which contains 35 496 structures. The reactive MD simulation with the DP model trained based on this data set revealed the pyrolysis mechanism of n-dodecane in detail, and the simulation results are in good agreement with the experimental measurements. In addition, this data set shows excellent transferability to different long-chain alkanes. These results demonstrate the advantages of the proposed method for constructing training data sets for similar systems.

Machine learned reactive force fields based on polynomial expansions have been shown to be highly effective for describing simulations involving reactive materials. Nevertheless, the highly flexible nature of these models can give rise to a large number of candidate parameters for complicated systems. In these cases, reliable parameterization requires a well-formed training set, which can be difficult to achieve through standard iterative fitting methods. Here, we present an active learning approach based on cluster analysis and inspired by Shannon information theory to enable semi-automated generation of informative training sets and robust machine learned force fields. The use of this tool is demonstrated for development of a model based on linear combinations of Chebyshev polynomials explicitly describing up to four-body interactions, for a chemically and structurally diverse system of C/O under extreme conditions. We show that this flexible training database management approach enables development of models exhibiting excellent agreement with Kohn-Sham density functional theory in terms of structure, dynamics, and speciation.

Predicting protein-ligand binding affinities and the associated thermodynamics of biomolecular recognition is a primary objective of structure-based drug design. Alchemical free energy simulations offer a highly accurate and computationally efficient route to achieving this goal. While the AMBER molecular dynamics package has successfully been used for alchemical free energy simulations in academic research groups for decades, widespread impact in industrial drug discovery settings has been minimal because of the previous limitations within the AMBER alchemical code, coupled with challenges in system setup and postprocessing workflows. Through a close academia-industry collaboration we have addressed many of the previous limitations with an aim to improve accuracy, efficiency, and robustness of alchemical binding free energy simulations in industrial drug discovery applications. Here, we highlight some of the recent advances in AMBER20 with a focus on alchemical binding free energy (BFE) calculations, which are less computationally intensive than alternative binding free energy methods where full binding/unbinding paths are explored. In addition to scientific and technical advances in AMBER20, we also describe the essential practical aspects associated with running relative alchemical BFE calculations, along with recommendations for best practices, highlighting the importance not only of the alchemical simulation code but also the auxiliary functionalities and expertise required to obtain accurate and reliable results. This work is intended to provide a contemporary overview of the scientific, technical, and practical issues associated with running relative BFE simulations in AMBER20, with a focus on real-world drug discovery applications.

The emergence of machine learning methods in quantum chemistry provides new methods to revisit an old problem: Can the predictive accuracy of electronic structure calculations be decoupled from their numerical bottlenecks? Previous attempts to answer this question have, among other methods, given rise to semi-empirical quantum chemistry in minimal basis representation. We present an adaptation of the recently proposed SchNet for Orbitals (SchNOrb) deep convolutional neural network model [K. T. Schütt et al., Nat. Commun. 10, 5024 (2019)] for electronic wave functions in an optimized quasi-atomic minimal basis representation. For five organic molecules ranging from 5 to 13 heavy atoms, the model accurately predicts molecular orbital energies and wave functions and provides access to derived properties for chemical bonding analysis. Particularly for larger molecules, the model outperforms the original atomic-orbital-based SchNOrb method in terms of accuracy and scaling. We conclude by discussing the future potential of this approach in quantum chemical workflows.

Intermolecular interactions are critical to many chemical phenomena, but their accurate computation using ab initio methods is often limited by computational cost. The recent emergence of machine learning (ML) potentials may be a promising alternative. Useful ML models should not only estimate accurate interaction energies but also predict smooth and asymptotically correct potential energy surfaces. However, existing ML models are not guaranteed to obey these constraints. Indeed, systemic deficiencies are apparent in the predictions of our previous hydrogen-bond model as well as the popular ANI-1X model, which we attribute to the use of an atomic energy partition. As a solution, we propose an alternative atomic-pairwise framework specifically for intermolecular ML potentials, and we introduce AP-Net—a neural network model for interaction energies. The AP-Net model is developed using this physically motivated atomic-pairwise paradigm and also exploits the interpretability of symmetry adapted perturbation theory (SAPT). We show that in contrast to other models, AP-Net produces smooth, physically meaningful intermolecular potentials exhibiting correct asymptotic behavior. Initially trained on only a limited number of mostly hydrogen-bonded dimers, AP-Net makes accurate predictions across the chemically diverse S66x8 dataset, demonstrating significant transferability. On a test set including experimental hydrogen-bonded dimers, AP-Net predicts total interaction energies with a mean absolute error of 0.37 kcal mol⁻¹, reducing errors by a factor of 2–5 across SAPT components from previous neural network potentials. The pairwise interaction energies of the model are physically interpretable, and an investigation of predicted electrostatic energies suggests that the model “learns” the physics of hydrogen-bonded interactions.

Combining multiple levels of theory in free energy simulations to balance computational accuracy and efficiency is a promising approach for studying processes in the condensed phase. While the basic idea has been proposed and explored for quite some time, it remains challenging to achieve convergence for such multi-level free energy simulations as it requires a favorable distribution overlap between different levels of theory. Previous efforts focused on improving the distribution overlap by either altering the low-level of theory for the specific system of interest or ignoring certain degrees of freedom. Here, we propose an alternative strategy that first identifies the degrees of freedom that lead to gaps in the distributions of different levels of theory and then treats them separately with either constraints or restraints or by introducing an intermediate model that better connects the low and high levels of theory. As a result, the conversion from the low level to the high level model is done in a staged fashion that ensures a favorable distribution overlap along the way. Free energy components associated with different steps are mostly evaluated explicitly, and thus, the final result can be meaningfully compared to the rigorous free energy difference between the two levels of theory with limited and well-defined approximations. The additional free energy component calculations involve simulations at the low level of theory and therefore do not incur high computational costs. The approach is illustrated with two simple but non-trivial solution examples, and factors that dictate the reliability of the result are discussed.