Article

Multireference Generalization of the Weighted Thermodynamic Perturbation Method

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We describe the generalized weighted thermodynamic perturbation (gwTP) method for estimating the free energy surface of an expensive "high-level" potential energy function from the umbrella sampling performed with multiple inexpensive "low-level" reference potentials. The gwTP method is a generalization of the weighted thermodynamic perturbation (wTP) method developed by Li and co-workers [J. Chem. Theory Comput. 2018, 14, 5583-5596] that uses a single "low-level" reference potential. The gwTP method offers new possibilities in model design whereby the sampling generated from several low-level potentials may be combined (e.g., specific reaction parameter models that might have variable accuracy at different stages of a multistep reaction). The gwTP method is especially well suited for use with machine learning potentials (MLPs) that are trained against computationally expensive ab initio quantum mechanical/molecular mechanical (QM/MM) energies and forces using active learning procedures that naturally produce multiple distinct neural network potentials. Simulations can be performed with greater sampling using the fast MLPs and then corrected to the ab initio level using gwTP. The capabilities of the gwTP method are demonstrated by creating reference potentials based on the MNDO/d and DFTB2/MIO semiempirical models supplemented with the "range-corrected deep potential" (DPRc). The DPRc parameters are trained to ab initio QM/MM data, and the potentials are used to calculate the free energy surface of stepwise mechanisms for nonenzymatic RNA 2'-O-transesterification model reactions. The extended sampling made possible by the reference potentials allows one to identify unequilibrated portions of the simulations that are not always evident from the short time scale commonly used with ab initio QM/MM potentials. We show that the reference potential approach can yield more accurate ab initio free energy predictions than the wTP method or what can be reasonably afforded from explicit ab initio QM/MM sampling.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Second, when the perturbation from the reference to the target potential is large, converging the free energy correction might be slow and even less efficient than the highly expensive direct sampling with QM. Many solutions have been proposed to attenuate the problem (8)(9)(10)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47)(48)(49). These methods either attempt to build a cheaper model of the target potential using machine learning (39)(40)(41)43) or (re)parameterize the reference potential to reduce its distance from the target (9,48,49), possibly using multiple reference potentials (36)(37)(38). ...
... Many solutions have been proposed to attenuate the problem (8)(9)(10)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47)(48)(49). These methods either attempt to build a cheaper model of the target potential using machine learning (39)(40)(41)43) or (re)parameterize the reference potential to reduce its distance from the target (9,48,49), possibly using multiple reference potentials (36)(37)(38). However, even with these improvements, when reference and target are too different, it can still be impossible to converge free energy estimates without sampling from the target and/or intermediate potentials using equilibrium (8,10,42,43) or nonequilibrium protocols (44,45). ...
... The latter adds to the parallelism already offered by modern molecular simulation engines, and it represents a fundamental advantage of the method given the diffusion of massively parallel computing architectures (25). Finally, multimap TFEP can be used in conjunction with methods such as force-matching (9, 49) and machine learning (reference and/or target) potentials (38)(39)(40)(41) to reduce the distance between the two distributions and obtain a cheaper model of the QM level of theory. These solutions could significantly accelerate convergence in case the NN map was unable during training to "discover" (and thus correct for) metastable states that are not predicted by the FF but are relevant to the QM. ...
Article
Full-text available
Accurate predictions of ligand binding affinities would greatly accelerate the first stages of drug discovery campaigns. However, using highly accurate interatomic potentials based on quantum mechanics (QM) in free energy methods has been so far largely unfeasible due to their prohibitive computational cost. Here, we present an efficient method to compute QM free energies from simulations using cheap reference potentials, such as force fields (FFs). This task has traditionally been out of reach due to the slow convergence of computing the correction from the FF to the QM potential. To overcome this bottleneck, we generalize targeted free energy methods to employ multiple maps—implemented with normalizing flow neural networks (NNs)—that maximize the overlap between the distributions. Critically, the method requires neither a separate expensive training phase for the NNs nor samples from the QM potential. We further propose a one-epoch learning policy to efficiently avoid overfitting, and we combine our approach with enhanced sampling strategies to overcome the pervasive problem of poor convergence due to slow degrees of freedom. On the drug-like molecules in the HiPen dataset, the method accelerates the calculation of the free energy difference of switching from an FF to a DFTB3 potential by three orders of magnitude compared to standard free energy perturbation and by a factor of eight compared to previously published nonequilibrium calculations. Our results suggest that our method, in combination with efficient QM/MM calculations, may be used in lead optimization campaigns in drug discovery and to study protein-ligand molecular recognition processes.
... 50 The analysis can be performed with the variational free energy profile (vFEP) method, 63,64 MBAR, 57 the weighted thermodynamic perturbation method (wTP), 65 and the generalized weighted thermodynamic perturbation method (gwTP). 66 The wTP and gwTP methods estimate the free energy surface of an expensive target-level of theory from the sampling performed with inexpensive reference potentials. 66 The estimation of ab initio QM/MM free energy surfaces in condensed-phase environments has become more practical in the latest version of AmberTools with the combined introduction of the GPUaccelerated QUICK software 67 and ndfes analysis program. ...
... 66 The wTP and gwTP methods estimate the free energy surface of an expensive target-level of theory from the sampling performed with inexpensive reference potentials. 66 The estimation of ab initio QM/MM free energy surfaces in condensed-phase environments has become more practical in the latest version of AmberTools with the combined introduction of the GPUaccelerated QUICK software 67 and ndfes analysis program. ...
Article
Full-text available
AmberTools is a free and open-source collection of programs used to set up, run, and analyze molecular simulations. The newer features contained within AmberTools23 are briefly described in this Application note.
... The trained DPRc model with a 6 Å range-correction was applied to simulate RNA 2 ′ -O-transphosphorylation reactions in solution in long timescales 75 and obtain better free energy estimates with the help of the generalization of the weighted thermodynamic perturbation (gwTP) method. 100 Very recently, Zeng et al. 72 have trained a Δ-MLP correction model called Quantum Deep Potential Interaction (QDπ) for drug-like molecules, including tautomeric forms and protonation states, which was found to be superior to other semiempirical methods and pure MLP models. 89 The third important application is large-scale reactive MD simulations over a nanosecond time scale, which enable the construction of interwoven reaction networks for complex reactive systems 101 instead of focusing on studying a single reaction. ...
... In all models, we set rs to 0.5 Å, M< to 16, and La to 2, if applicable. We used (25,50,100) neurons for two-body embedding networks N e,2 , (2,4,8) neurons for three-body embedding networks N e,3 , and (240, 240, 240, 1) neurons for fitting networks F 0 . In the full-information part (se_e2_a) of the hybrid descriptor with two-body embedding full-information and radius-information DeepPot-SE (se_e2_a+se_e2_r) and the two-body embedding part (se_e2_a) of the hybrid descriptor with two-body full-information and three-body DeepPot-SE (se_e2_a+se_e3), we set rc to 4 Å. ...
Article
Full-text available
DeePMD-kit is a powerful open-source software package that facilitates molecular dynamics simulations using machine learning potentials known as Deep Potential (DP) models. This package, which was released in 2017, has been widely used in the fields of physics, chemistry, biology, and material science for studying atomistic systems. The current version of DeePMD-kit offers numerous advanced features, such as DeepPot-SE, attention-based and hybrid descriptors, the ability to fit tensorial properties, type embedding, model deviation, DP-range correction, DP long range, graphics processing unit support for customized operators, model compression, non-von Neumann molecular dynamics, and improved usability, including documentation, compiled binary packages, graphical user interfaces, and application programming interfaces. This article presents an overview of the current major version of the DeePMD-kit package, highlighting its features and technical details. Additionally, this article presents a comprehensive procedure for conducting molecular dynamics as a representative application, benchmarks the accuracy and efficiency of different models, and discusses ongoing developments.
... An additional concern when applying a QM/MM+∆MLP model is that it may not reliably represent the target ab initio QM/MM model outside the scope of the training data ensemble. One possible mechanism to perform a consistency check is to use a weighted thermodynamic perturbation approach [43] or generalized multireference variant [44]. These methods potentially can correct the approximate QM/MM+∆MLP free energy surface to the target ab initio QM/MM level, provided that there is sufficient phase space overlap as indicated by an analysis of the reweighting entropies. ...
Article
Full-text available
Rare tautomeric forms of nucleobases can lead to Watson–Crick-like (WC-like) mispairs in DNA, but the process of proton transfer is fast and difficult to detect experimentally. NMR studies show evidence for the existence of short-time WC-like guanine–thymine (G-T) mispairs; however, the mechanism of proton transfer and the degree to which nuclear quantum effects play a role are unclear. We use a B-DNA helix exhibiting a wGT mispair as a model system to study tautomerization reactions. We perform ab initio (PBE0/6-31G*) quantum mechanical/molecular mechanical (QM/MM) simulations to examine the free energy surface for tautomerization. We demonstrate that while the ab initio QM/MM simulations are accurate, considerable sampling is required to achieve high precision in the free energy barriers. To address this problem, we develop a QM/MM machine learning potential correction (QM/MM-ΔMLP) that is able to improve the computational efficiency, greatly extend the accessible time scales of the simulations, and enable practical application of path integral molecular dynamics to examine nuclear quantum effects. We find that the inclusion of nuclear quantum effects has only a modest effect on the mechanistic pathway but leads to a considerable lowering of the free energy barrier for the GT*⇌G*T equilibrium. Our results enable a rationalization of observed experimental data and the prediction of populations of rare tautomeric forms of nucleobases and rates of their interconversion in B-DNA.
... To validate the convergence of the free energy surface, it was necessary to sample each window for 100 ps using a 1 fs time step and repeat each simulation three times with different random number seeds. 87 This The PIMD simulations for the calculation of ΔG RS and ΔG TS were run for 10 ps with a 0.25 fs time step and six ring polymer beads using the PIGLET thermostat. 49 This protocol corresponds to 0.5 × 10 6 energy evaluations to estimate η PIMD , which is less than 1% of the computational effort used to converge the classical free energy surface. ...
Article
We use the modified Bigeleisen–Mayer equation to compute kinetic isotope effect values for non-enzymatic phosphoryl transfer reactions from classical and path integral molecular dynamics umbrella sampling. The modified form of the Bigeleisen–Mayer equation consists of a ratio of imaginary mode vibrational frequencies and a contribution arising from the isotopic substitution’s effect on the activation free energy, which can be computed from path integral simulation. In the present study, we describe a practical method for estimating the frequency ratio correction directly from umbrella sampling in a manner that does not require normal mode analysis of many geometry optimized structures. Instead, the method relates the frequency ratio to the change in the mass weighted coordinate representation of the minimum free energy path at the transition state induced by isotopic substitution. The method is applied to the calculation of 16/18O and 32/34S primary kinetic isotope effect values for six non-enzymatic phosphoryl transfer reactions. We demonstrate that the results are consistent with the analysis of geometry optimized transition state ensembles using the traditional Bigeleisen–Mayer equation. The method thus presents a new practical tool to enable facile calculation of kinetic isotope effect values for complex chemical reactions in the condensed phase.
Article
Full-text available
We report the development and testing of new integrated cyberinfrastructure for performing free energy simulations with generalized hybrid quantum mechanical/molecular mechanical (QM/MM) and machine learning potentials (MLPs) in Amber. The Sander molecular dynamics program has been extended to leverage fast, density-functional tight-binding models implemented in the DFTB+ and xTB packages, and an interface to the DeePMD-kit software enables the use of MLPs. The software is integrated through application program interfaces that circumvent the need to perform “system calls” and enable the incorporation of long-range Ewald electrostatics into the external software’s self-consistent field procedure. The infrastructure provides access to QM/MM models that may serve as the foundation for QM/MM–ΔMLP potentials, which supplement the semiempirical QM/MM model with a MLP correction trained to reproduce ab initio QM/MM energies and forces. Efficient optimization of minimum free energy pathways is enabled through a new surface-accelerated finite-temperature string method implemented in the FE-ToolKit package. Furthermore, we interfaced Sander with the i-PI software by implementing the socket communication protocol used in the i-PI client–server model. The new interface with i-PI allows for the treatment of nuclear quantum effects with semiempirical QM/MM–ΔMLP models. The modular interoperable software is demonstrated on proton transfer reactions in guanine-thymine mispairs in a B-form deoxyribonucleic acid helix. The current work represents a considerable advance in the development of modular software for performing free energy simulations of chemical reactions that are important in a wide range of applications.
Article
Full-text available
Understanding enzyme mechanisms is essential for unraveling the complex molecular machinery of life. In this review, we survey the field of computational enzymology, highlighting key principles governing enzyme mechanisms and discussing ongoing challenges and promising advances. Over the years, computer simulations have become indispensable in the study of enzyme mechanisms, with the integration of experimental and computational exploration now established as a holistic approach to gain deep insights into enzymatic catalysis. Numerous studies have demonstrated the power of computer simulations in characterizing reaction pathways, transition states, substrate selectivity, product distribution, and dynamic conformational changes for various enzymes. Nevertheless, significant challenges remain in investigating the mechanisms of complex multistep reactions, large-scale conformational changes, and allosteric regulation. Beyond mechanistic studies, computational enzyme modeling has emerged as an essential tool for computer-aided enzyme design and the rational discovery of covalent drugs for targeted therapies. Overall, enzyme design/engineering and covalent drug development can greatly benefit from our understanding of the detailed mechanisms of enzymes, such as protein dynamics, entropy contributions, and allostery, as revealed by computational studies. Such a convergence of different research approaches is expected to continue, creating synergies in enzyme research. This review, by outlining the ever-expanding field of enzyme research, aims to provide guidance for future research directions and facilitate new developments in this important and evolving field.
Article
In silico investigations of enzymatic reactions and chemical reactions in condensed phases often suffer from formidable computational costs due to a large number of degrees of freedom and enormous important volume in phase space. Usually, accuracy must be compromised to trade for efficiency by lowering the reliability of the Hamiltonians employed or reducing the sampling time. Reference-potential methods (RPMs) offer an alternative approach to reaching high accuracy of simulation without much loss of efficiency. In this Perspective, we summarize the idea of RPMs and showcase some recent applications. Most importantly, the pitfalls of these methods are also discussed, and remedies to these pitfalls are presented.
Article
Full-text available
We employ machine learning to derive tight-binding parametrizations for the electronic structure of defects. We test several machine learning methods that map the atomic and electronic structure of a defect onto a sparse tight-binding parameterization. Since Multi-layer perceptrons (i.e., feed-forward neural networks) perform best we adopt them for our further investigations. We demonstrate the accuracy of our parameterizations for a range of important electronic structure properties such as band structure, local density of states, transport and level spacing simulations for two common defects in single layer graphene. Our machine learning approach achieves results comparable to maximally localized Wannier functions (i.e., DFT accuracy) without prior knowledge about the electronic structure of the defects while also allowing for a reduced interaction range which substantially reduces calculation time. It is general and can be applied to a wide range of other materials, enabling accurate large-scale simulations of material properties in the presence of different defects.
Article
Full-text available
Modeling complex energy materials such as solid-state electrolytes (SSEs) realistically at the atomistic level strains the capabilities of state-of-the-art theoretical approaches. On one hand, the system sizes and simulation time scales required are prohibitive for first-principles methods such as the density functional theory. On the other hand, parameterizations for empirical potentials are often not available, and these potentials may ultimately lack the desired predictive accuracy. Fortunately, modern machine learning (ML) potentials are increasingly able to bridge this gap, promising first-principles accuracy at a much reduced computational cost. However, the local nature of these ML potentials typically means that long-range contributions arising, for example, from electrostatic interactions are neglected. Clearly, such interactions can be large in polar materials such as electrolytes, however. Herein, we investigate the effect that the locality assumption of ML potentials has on lithium mobility and defect formation energies in the SSE Li7P3S11. We find that neglecting long-range electrostatics is unproblematic for the description of lithium transport in the isotropic bulk. In contrast, (field-dependent) defect formation energies are only adequately captured by a hybrid potential combining ML and a physical model of electrostatic interactions. Broader implications for ML-based modeling of energy materials are discussed.
Article
Full-text available
Quantum mechanics/molecular mechanics (QM/MM) molecular dynamics (MD) simulations have been developed to simulate molecular systems, where an explicit description of changes in the electronic structure is necessary. However, QM/MM MD simulations are computationally expensive compared to fully classical simulations as all valence electrons are treated explicitly and a self-consistent field (SCF) procedure is required. Recently, approaches have been proposed to replace the QM description with machine-learned (ML) models. However, condensed-phase systems pose a challenge for these approaches due to long-range interactions. Here, we establish a workflow, which incorporates the MM environment as an element type in a high-dimensional neural network potential (HDNNP). The fitted HDNNP describes the potential-energy surface of the QM particles with an electrostatic embedding scheme. Thus, the MM particles feel a force from the polarized QM particles. To achieve chemical accuracy, we find that even simple systems require models with a strong gradient regularization, a large number of data points, and a substantial number of parameters. To address this issue, we extend our approach to a Δ-learning scheme, where the ML model learns the difference between a reference method (density functional theory (DFT)) and a cheaper semiempirical method (density functional tight binding (DFTB)). We show that such a scheme reaches the accuracy of the DFT reference method while requiring significantly less parameters. Furthermore, the Δ-learning scheme is capable of correctly incorporating long-range interactions within a cutoff of 1.4 nm. It is validated by performing MD simulations of retinoic acid in water and the interaction between S-adenoslymethioniat and cytosine in water. The presented results indicate that Δ-learning is a promising approach for (QM)ML/MM MD simulations of condensed-phase systems.
Article
Full-text available
The tight-binding (TB) method is an ideal candidate for determining electronic and transport properties for a large-scale system. It describes the system as real-space Hamiltonian matrices expressed on a manageable number of parameters, leading to substantially lower computational costs than the ab-initio methods. Since the whole system is defined by the parameterization scheme, the choice of the TB parameters decides the reliability of the TB calculations. The typical empirical TB method uses the TB parameters directly from the existing parameter sets, which hardly reproduces the desired electronic structures quantitatively without specific optimizations. It is thus not suitable for quantitative studies like the transport property calculations. The ab-initio TB method derives the TB parameters from the ab-initio results through the transformation of basis functions, which achieves much higher numerical accuracy. However, it assumes prior knowledge of the basis and may encompass truncation error. Here, a machine learning method for TB Hamiltonian parameterization is proposed, within which a neural network (NN) is introduced with its neurons acting as the TB matrix elements. This method can construct the empirical TB model that reproduces the given ab-initio energy bands with predefined accuracy, which provides a fast and convenient way for TB model construction and gives insights into machine learning applications in physical problems.
Article
Full-text available
Machine learning potentials have become an important tool for atomistic simulations in many fields, from chemistry via molecular biology to materials science. Most of the established methods, however, rely on local properties and are thus unable to take global changes in the electronic structure into account, which result from long-range charge transfer or different charge states. In this work we overcome this limitation by introducing a fourth-generation high-dimensional neural network potential that combines a charge equilibration scheme employing environment-dependent atomic electronegativities with accurate atomic energies. The method, which is able to correctly describe global charge distributions in arbitrary systems, yields much improved energies and substantially extends the applicability of modern machine learning potentials. This is demonstrated for a series of systems representing typical scenarios in chemistry and materials science that are incorrectly described by current methods, while the fourth-generation neural network potential is in excellent agreement with electronic structure calculations.
Article
Full-text available
Correlated quantum-chemical methods for condensed matter systems, such as the random phase approximation (RPA), hold the promise of reaching a level of accuracy much higher than that of conventional density functional theory approaches. However, the high computational cost of such methods hinders their broad applicability, in particular for finite-temperature molecular dynamics simulations. We propose a method that couples machine learning techniques with thermodynamic perturbation theory to estimate finite-temperature properties using correlated approximations. We apply this approach to compute the enthalpies of adsorption in zeolites and show that reliable estimates can be obtained by training a machine learning model with as few as 10 RPA energies. This approach paves the way to the broader use of computationally expensive quantum-chemical methods to predict the finite-temperature properties of condensed matter systems.
Article
Full-text available
Computational modeling of chemical and biological systems at atomic resolution is a crucial tool in the chemist's toolset. The use of computer simulations requires a balance between cost and accuracy: quantum-mechanical methods provide high accuracy but are computationally expensive and scale poorly to large systems, while classical force fields are cheap and scalable, but lack transferability to new systems. Machine learning can be used to achieve the best of both approaches. Here we train a general-purpose neural network potential (ANI-1ccx) that approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. This is achieved by training a network to DFT data then using transfer learning techniques to retrain on a dataset of gold standard QM calculations (CCSD(T)/CBS) that optimally spans chemical space. The resulting potential is broadly applicable to materials science, biology, and chemistry, and billions of times faster than CCSD(T)/CBS calculations.
Article
Full-text available
Recent developments in many-body potential energy representation via deep learning have brought new hopes to addressing the accuracy-versus-efficiency dilemma in molecular simulations. Here we describe DeePMD-kit, a package written in Python/C++ that has been designed to minimize the effort required to build deep learning based representation of potential energy and force field and to perform molecular dynamics. Potential applications of DeePMD-kit span from finite molecules to extended systems and from metallic systems to chemically bonded systems. DeePMD-kit is interfaced with TensorFlow, one of the most popular deep learning frameworks, making the training process highly automatic and efficient. On the other end, DeePMD-kit is interfaced with high-performance classical molecular dynamics and quantum (path-integral) molecular dynamics packages, i.e., LAMMPS and the i-PI, respectively. Thus, upon training, the potential energy and force field models can be used to perform efficient molecular simulations for different purposes. As an example of the many potential applications of the package, we use DeePMD-kit to learn the interatomic potential energy and forces of a water model using data obtained from density functional theory. We demonstrate that the resulted molecular dynamics model reproduces accurately the structural information contained in the original model.
Article
Full-text available
Deep learning is revolutionizing many areas of science and technology, especially image, text and speech recognition. In this paper, we demonstrate how a deep neural network (NN) trained on quantum mechanical (QM) DFT calculations can learn an accurate and fully transferable potential for organic molecules. We introduce ANAKIN-ME (Accurate NeurAl networK engINe for Molecular Energies) or ANI in short. ANI is a new method and procedure for training neural network potentials that utilizes a highly modified version of the Behler and Parrinello symmetry functions to build single-atom atomic environment vectors as a molecular representation. We utilize ANI to build a potential called ANI-1, which was trained on a subset of the GDB databases with up to 8 heavy atoms to predict total energies for organic molecules containing four atom types: H, C, N, and O. To obtain an accelerated but physically relevant sampling of molecular potential surfaces, we also propose a Normal Mode Sampling (NMS) method for generating molecular configurations. Through a series of case studies, we show that ANI-1 is chemically accurate compared to reference DFT calculations on much larger molecular systems (up to 54 atoms) than those included in the training data set, with root mean square errors as low as 0.56 kcal/mol.
Article
Machine-learning-based interatomic potential energy surface (PES) models are revolutionizing the field of molecular modeling. However, although much faster than electronic structure schemes, these models suffer from costly computations via deep neural networks to predict the energy and atomic forces, resulting in lower running efficiency as compared to the typical empirical force fields. Herein, we report a model compression scheme for boosting the performance of the Deep Potential (DP) model, a deep learning-based PES model. This scheme, we call DP Compress, is an efficient postprocessing step after the training of DP models (DP Train). DP Compress combines several DP-specific compression techniques, which typically speed up DP-based molecular dynamics simulations by an order of magnitude faster and consume an order of magnitude less memory. We demonstrate that DP Compress is sufficiently accurate by testing a variety of physical properties of Cu, H2O, and Al-Cu-Mg systems. DP Compress applies to both CPU and GPU machines and is publicly available online.
Article
Accurate thermochemistry is essential in many chemical disciplines, such as astro-, atmospheric, or combustion chemistry. These areas often involve fleetingly existent intermediates whose thermochemistry is difficult to assess. Whenever direct calorimetric experiments are infeasible, accurate computational estimates of relative molecular energies are required. However, high-level computations, often using coupled cluster theory, are generally resource-intensive. To expedite the process using machine learning techniques, we generated a database of energies for small organic molecules at the CCSD(T)/cc-pVDZ, CCSD(T)/aug-cc-pVDZ, and CCSD(T)/cc-pVTZ levels of theory. Leveraging the power of deep learning by employing graph neural networks, we are able to predict the effect of perturbatively included triples (T), that is, the difference between CCSD and CCSD(T) energies, with a mean absolute error of 0.25, 0.25, and 0.28 kcal mol-1 (R2 of 0.998, 0.997, and 0.998) with the cc-pVDZ, aug-cc-pVDZ, and cc-pVTZ basis sets, respectively. Our models were further validated by application to three validation sets taken from the S22 Database as well as to a selection of known theoretically challenging cases.
Article
A machine-learning based approach for evaluating potential energies for quantum mechanical studies of properties of the ground and excited vibrational states of small molecules is developed. This approach uses the molecular-orbital-based machine learning (MOB-ML) method to generate electronic energies with the accuracy of CCSD(T) calculations at the same cost as a Hartree-Fock calculation. To further reduce the computational cost of the potential energy evaluations without sacrificing the CCSD(T) level accuracy, GPU-accelerated Neural Network Potential Energy Surfaces (NN-PES) are trained to geometries and energies that are collected from small-scale Diffusion Monte Carlo (DMC) simulations, which are run using energies evaluated using the MOB-ML model. The combined NN+(MOB-ML) approach is used in variational calculations of the ground and low-lying vibrational excited states of water and in DMC calculations of the ground states of water, CH5+, and its deuterated analogues. For both of these molecules, comparisons are made to the results obtained using potentials that were fit to much larger sets of electronic energies than were required to train the MOB-ML models. The NN+(MOB-ML) approach is also used to obtain a potential surface for C2H5+, which is a carbocation with a nonclassical equilibrium structure for which there is currently no available potential surface. This potential is used to explore the CH stretching vibrations, focusing on those of the bridging hydrogen atom. For both CH5+ and C2H5+ the MOB-ML model is trained using geometries that were sampled from an AIMD trajectory, which was run at 350 K. By comparison, the structures sampled in the ground state calculations can have energies that are as much as ten times larger than those used to train the MOB-ML model. For water a higher temperature AIMD trajectory is needed to obtain accurate results due to the smaller thermal energy. A second MOB-ML model for C2H5+ was developed with additional higher energy structures in the training set. The two models are found to provide nearly identical descriptions of the ground state of C2H5+.
Article
We present a fast, accurate, and robust approach for determination of free energy profiles and kinetic isotope effects for RNA 2'-O-transphosphorylation reactions with inclusion of nuclear quantum effects. We apply a deep potential range correction (DPRc) for combined quantum mechanical/molecular mechanical (QM/MM) simulations of reactions in the condensed phase. The method uses the second-order density-functional tight-binding method (DFTB2) as a fast, approximate base QM model. The DPRc model modifies the DFTB2 QM interactions and applies short-range corrections to the QM/MM interactions to reproduce ab initio DFT (PBE0/6-31G*) QM/MM energies and forces. The DPRc thus enables both QM and QM/MM interactions to be tuned to high accuracy, and the QM/MM corrections are designed to smoothly vanish at a specified cutoff boundary (6 Å in the present work). The computational speed-up afforded by the QM/MM+DPRc model enables free energy profiles to be calculated that include rigorous long-range QM/MM interactions under periodic boundary conditions and nuclear quantum effects through a path integral approach using a new interface between the AMBER and i-PI software. The approach is demonstrated through the calculation of free energy profiles of a native RNA cleavage model reaction and reactions involving thio-substitutions, which are important experimental probes of the mechanism. The DFTB2+DPRc QM/MM free energy surfaces agree very closely with the PBE0/6-31G* QM/MM results, and it is vastly superior to the DFTB2 QM/MM surfaces with and without weighted thermodynamic perturbation corrections. 18O and 34S primary kinetic isotope effects are compared, and the influence of nuclear quantum effects on the free energy profiles is examined.
Article
Molecular dynamics (MD) simulations employing ab initio quantum mechanical and molecular mechanical (ai-QM/MM) potentials are considered to be the state of the art, but the high computational cost associated with the ai-QM calculations remains a theoretical challenge for their routine application. Here, we present a modified protocol of the multiple time step (MTS) method for accelerating ai-QM/MM MD simulations of condensed-phase reactions. Within a previous MTS protocol [Nam J. Chem. Theory Comput. 2014, 10, 4175], reference forces are evaluated using a low-level (semiempirical QM/MM) Hamiltonian and employed at inner time steps to propagate the nuclear motions. Correction forces, which arise from the force differences between high-level (ai-QM/MM) and low-level Hamiltonians, are applied at outer time steps, where the MTS algorithm allows the time-reversible integration of the correction forces. To increase the outer step size, which is bound by the highest-frequency component in the correction forces, the semiempirical QM Hamiltonian is recalibrated in this work to minimize the magnitude of the correction forces. The remaining high-frequency modes, which are mainly bond stretches involving hydrogen atoms, are then removed from the correction forces. When combined with a Langevin or SIN(R) thermostat, the modified MTS-QM/MM scheme remains robust with an up to 8 (with Langevin) or 10 fs (with SIN(R)) outer time step (with 1 fs inner time steps) for the chorismate mutase system. This leads to an over 5-fold speedup over standard ai-QM/MM simulations, without sacrificing the accuracy in the predicted free energy profile of the reaction.
Article
Δ-machine learning, or the hierarchical construction scheme, is a highly cost-effective method, as only a small number of high-level ab initio energies are required to improve a potential energy surface (PES) fit to a large number of low-level points. However, there is no efficient and systematic way to select as few points as possible from the low-level data set. We here propose a permutation-invariant-polynomial neural-network (PIP-NN)-based Δ-machine learning approach to construct full-dimensional accurate PESs of complicated reactions efficiently. Particularly, the high flexibility of the NN is exploited to efficiently sample points from the low-level data set. This approach is applied to the challenging case of a HO2 self-reaction with a large configuration space. Only 14% of the DFT data set is used to successfully bring a newly fitted DFT PES to the UCCSD(T)-F12a/AVTZ quality. Then, the quasiclassical trajectory (QCT) calculations are performed to study its dynamics, particularly the mode specificity.
Article
A great need exists for computationally efficient quantum simulation approaches that can achieve an accuracy similar to high-level theories at a fraction of the computational cost. In this regard, we have leveraged a machine-learned interaction potential based on Chebyshev polynomials to improve density functional tight binding (DFTB) models for organic materials. The benefit of our approach is two-fold: (1) many-body interactions can be corrected for in a systematic and rapidly tunable process, and (2) high-level quantum accuracy for a broad range of compounds can be achieved with ∼0.3% of data required for one advanced deep learning potential. Our model exhibits both transferability and extensibility through comparison to quantum chemical results for organic clusters, solid carbon phases, and molecular crystal phase stability rankings. Our efforts thus allow for high-throughput physical and chemical predictions with up to coupled-cluster accuracy for systems that are computationally intractable with standard approaches.
Article
Ion selectivity in protein binding sites is of great significance to biological functions. Although additive force fields have been successfully applied to various protein-related studies, it is difficult to well capture the subtle metal-protein interaction for the prediction of ion selectivity, due to the remarkable polarization and charge transfer effect between the metals and the surrounding residues. Quantum mechanics-based methods are well-suited for dealing with these systems, but they are too costly to apply in a direct manner. In this work, the reference-potential method (RPM) was used to measure the selectivity for calcium and magnesium cations in the binding pocket of parvalbumin B protein by calculating the free energy change associated with this substitution reaction at an ab initio quantum mechanics/molecular mechanics (QM/MM) level. The alchemical transformations were performed at the molecular mechanics level, and the relative binding free energy was then corrected to the QM/MM level via thermodynamic perturbation. In this way, the free energy change at the QM/MM level for the substitution reaction was obtained without running the QM/MM simulations, thus remarkably enhancing the efficiency. In the reweighting process, we found that the selection of the QM region greatly affects the accuracy of the QM/MM method. In particular, the charge transfer effect on the free energy change of a reaction cannot be neglected.
Article
We investigate the feasibility of improving the semi-empirical density functional based tight-binding method through a general and transferable many-body repulsive potential for pure silicon using a common machine-learning framework. Atomic environments using atom centered symmetry functions fed into flexible neural-networks allow us to overcome the limited pair potentials used until now with the ability to train simultaneously on a large variety of systems. We achieve an improvement on bulk systems with good performance on energetic, vibrational, and structural properties. Contrarily, there are difficulties for clusters due to surface effects. To deepen the discussion, we also put these results into perspective with two fully machine-learned numerical potentials for silicon from the literature. This allows us to identify both the transferability of such approaches together with the impact of narrowing the role of machine-learning models to reproduce only a part of the total energy.
Article
Semiempirical methods like density functional tight-binding (DFTB) allow extensive phase space sampling, making it possible to generate free energy surfaces of complex reactions in condensed-phase environments. Such a high efficiency often comes at the cost of reduced accuracy, which may be improved by developing a specific reaction parametrization (SRP) for the particular molecular system. Thiol-disulfide exchange is a nucleophilic substitution reaction that occurs in a large class of proteins. Its proper description requires a high-level ab initio method, while DFT-GAA and hybrid functionals were shown to be inadequate, and so is DFTB due to its DFT-GGA descent. We develop an SRP for thiol-disulfide exchange based on an artificial neural network (ANN) implementation in the DFTB+ software and compare its performance to that of a standard SRP approach applied to DFTB. As an application, we use both new DFTB-SRP as components of a QM/MM scheme to investigate thiol-disulfide exchange in two molecular complexes: a solvated model system and a blood protein. Demonstrating the strengths of the methodology, highly accurate free energy surfaces are generated at a low cost, as the augmentation of DFTB with an ANN only adds a small computational overhead.
Article
Atomic vibrations can inform about materials properties from hole transport in organic semiconductors to correlated disorder in metal-organic frameworks. Currently, there are several methods for predicting these vibrations using simulations, but the accuracy-efficiency tradeoffs have not been examined in depth. In this study, rubrene is used as a model system to predict atomic vibrational properties using six different simulation methods: density functional theory, density functional tight binding, density functional tight binding with a Chebyshev polynomial-based correction, a trained machine learning model, a pretrained machine learning model called ANI-1, and a classical forcefield model. The accuracy of each method is evaluated by comparison to the experimental inelastic neutron scattering spectrum. All methods discussed here show some accuracy across a wide energy region, though the Chebyshev-corrected tight-binding method showed the optimal combination of high accuracy with low expense. We then offer broad simulation guidelines to yield efficient, accurate results for inelastic neutron scattering spectrum prediction.
Article
We present an approach that extends the theory of targeted free energy perturbation (TFEP) to calculate free energy differences and free energy surfaces at an accurate quantum mechanical level of theory from a cheaper reference potential. The convergence is accelerated by a mapping function that increases the overlap between the target and the reference distributions. Building on recent work, we show that this map can be learned with a normalizing flow neural network, without requiring simulations with the expensive target potential but only a small number of single-point calculations, and, crucially, avoiding the systematic error that was found previously. We validate the method by numerically evaluating the free energy difference in a system with a double-well potential and by describing the free energy landscape of a simple chemical reaction in the gas phase.
Article
We redevelop the variational free energy profile (vFEP) method using a cardinal B-spline basis to extend the method for analyzing free energy surfaces (FESs) involving three or more reaction coordinates. We also implemented software for evaluating high-dimensional profiles based on the multistate Bennett acceptance ratio (MBAR) method which constructs an unbiased probability density from global reweighting of the observed samples. The MBAR method takes advantage of a fast algorithm for solving the unbinned weighted histogram (UWHAM)/MBAR equations which replaces the solution of simultaneous equations with a nonlinear optimization of a convex function. We make use of cardinal B-splines and multiquadric radial basis functions to obtain smooth, differentiable MBAR profiles in arbitrary high dimensions. The cardinal B-spline vFEP and MBAR methods are compared using three example systems that examine 1D, 2D, and 3D profiles. Both methods are found to be useful and produce nearly indistinguishable results. The vFEP method is found to be 150 times faster than MBAR when applied to periodic 2D profiles, but the MBAR method is 4.5 times faster than vFEP when evaluating unbounded 3D profiles. In agreement with previous comparisons, we find the vFEP method produces superior FESs when the overlap between umbrella window simulations decreases. Finally, the associative reaction mechanism of hammerhead ribozyme is characterized using 3D, 4D, and 6D profiles, and the higher-dimensional profiles are found to have smaller reaction barriers by as much as 1.5 kcal/mol. The methods presented here have been implemented into the FE-ToolKit software package along with new methods for network-wide free energy analysis in drug discovery.
Article
Although quantum mechanical/molecular mechanics (QM/MM) methods are now routinely applied to the studies of chemical reactions in condensed phases and enzymatic reactions, they may experience technical difficulties when the reactive region is varying over time. For instance, when the solvent molecules are directly participating in the reaction, the exchange of water molecules between the QM and MM regions may occur on a time scale comparable to the reaction time. To cope with this situation, several adaptive QM/MM schemes have been proposed. However, these methods either add significantly to the computational cost or introduce artificial restraints to the system. In this work, we developed a novel adaptive QM/MM scheme and applied it to the study of a nucleophilic addition reaction. In this scheme, the configuration sampling was performed with a small QM region (without solvent molecules), and the thermodynamic properties under another potential energy function with a larger QM region (with a certain number of solvent molecules and/or different levels of QM theory) are computed via extrapolation using the reference-potential method. Our simulation results show that this adaptive QM/MM scheme is numerically stable, at least for the case studied in this work. Furthermore, this method also offers an inexpensive way to examine the convergence of the QM/MM calculation with respect to the size of the QM region.
Article
First-principles prediction of nuclear magnetic resonance chemical shifts plays an increasingly important role in the interpretation of experimental spectra, but the required density functional theory (DFT) calculations can be computationally expensive. Promising machine learning models for predicting chemical shieldings in general organic molecules have been developed previously, though the accuracy of those models remains below that of DFT. The present study demonstrates how much higher accuracy chemical shieldings can be obtained via the Δ-machine learning approach, with the result that the errors introduced by the machine learning model are only one-half to one-third the errors expected for DFT chemical shifts relative to experiment. Specifically, an ensemble of neural networks is trained to correct PBE0/6-31G chemical shieldings up to the target level of PBE0/6-311+G(2d,p). It can predict 1H, 13C, 15N, and 17O chemical shieldings with root-mean-square errors of 0.11, 0.70, 1.69, and 2.47 ppm, respectively. At the same time, the Δ-machine learning approach is 1-2 orders of magnitude faster than the target large-basis calculations. It is also demonstrated that the machine learning model predicts experimental solution-phase NMR chemical shifts in drug molecules with only modestly worse accuracy than the target DFT model. Finally, the ability to estimate the uncertainty in the predicted shieldings based on variations within the ensemble of neural network models is also assessed.
Article
Calculations of free energy profile, aka potential of mean force (PMF), along a chosen collective variable (CV) are now routinely applied in the studies of chemical processes, such as enzymatic reactions and chemical reactions in condensed phases. However, if the ab initio QM/MM level of accuracy is required for the PMF, it can be formidably demanding even with the most advanced enhanced sampling methods, such as umbrella sampling. To ameliorate this difficulty, we developed a novel method for the computation of free energy profile based on the reference-potential method recently, in which a low-level reference Hamiltonian is employed for phase space sampling and the free energy profile can be corrected to the level of interest (the target Hamiltonian) by energy reweighting in a nonparametric way. However, when the reference Hamiltonian is very different from the target Hamiltonian, the calculated ensemble averages, including the PMF, often suffer from numerical instability, which mainly comes from the overestimation of the density-of-states (DoS) in the low-energy region. Stochastic samplings of these low-energy configurations are rare events, and some low-energy conformations may get oversampled in simulations of a finite length. In this work, an assumption of Gaussian distribution is applied to the DoS in each CV bin, and the weight of each configuration is rescaled according to the accumulated DoS. The results show that this smoothing process can remarkably reduce the ruggedness of the PMF and increase the reliability of the reference-potential method.
Article
We combine density-functional tight-binding (DFTB) with deep tensor neural networks (DTNN) to maximize the strengths of both approaches in predicting structural, energetic, and vibrational molecular properties. The DTNN is used to construct a non-linear model for the localized many-body interatomic repulsive energy, which so far has been treated in an atom-pairwise manner in DFTB. Substantially improving upon standard DFTB and DTNN, the resulting DFTB-NNrep model yields accurate predictions of atomization and isomerization energies, equilibrium geometries, vibrational frequencies and dihedral rotation profiles for a large variety of organic molecules compared to the hybrid DFT-PBE0 functional. Our results highlight the high potential of combining semi-empirical electronic-structure methods with physically-motivated machine learning approaches for predicting localized many-body interactions. We conclude by discussing future advancements of the DFTB-NNrep approach that could enable chemically accurate electronic-structure calculations for systems with tens of thousands of atoms.
Article
Combining multiple levels of theory in free energy simulations to balance computational accuracy and efficiency is a promising approach for studying processes in the condensed phase. While the basic idea has been proposed and explored for quite some time, it remains challenging to achieve convergence for such multi-level free energy simulations as it requires a favorable distribution overlap between different levels of theory. Previous efforts focused on improving the distribution overlap by either altering the low-level of theory for the specific system of interest or ignoring certain degrees of freedom. Here, we propose an alternative strategy that first identifies the degrees of freedom that lead to gaps in the distributions of different levels of theory and then treats them separately with either constraints or restraints or by introducing an intermediate model that better connects the low and high levels of theory. As a result, the conversion from the low level to the high level model is done in a staged fashion that ensures a favorable distribution overlap along the way. Free energy components associated with different steps are mostly evaluated explicitly, and thus, the final result can be meaningfully compared to the rigorous free energy difference between the two levels of theory with limited and well-defined approximations. The additional free energy component calculations involve simulations at the low level of theory and therefore do not incur high computational costs. The approach is illustrated with two simple but non-trivial solution examples, and factors that dictate the reliability of the result are discussed.
Article
While free energies are fundamental thermodynamic quantities to characterize chemical reactions, their calculation based on ab initio theory is usually limited by the high computational cost. This is particularly true if multiple levels of theory have to be tested to establish their relative accuracy, if highly expensive quantum mechanical approximations are of interest, and also if several different temperatures have to be considered. We present an ab initio approach that effectively couples perturbation theory and machine learning to make ab initio free energy calculations more affordable. Starting from results based on a certain production ab initio theory, perturbation theory is applied to obtain free energies. The large number of single point calculations required by a brute force application of this approach are here significantly decreased by applying machine learning techniques. Importantly, the training of the machine learning model requires only a small amount of data and does not need to be performed again when the temperature is decreased. The accuracy and efficiency of this method is demonstrated by computing the free energy of activation of the proton exchange reaction in the zeolite chabazite. Starting from an ab initio calculation based on a semilocal approximation of density functional theory, free energies based on significantly more expensive non-local van der Waals and hybrid functionals are obtained with only a few tens of additional single point calculations. In this way this work paves the route to quick free energy calculations using different levels of theory or approximations that would be too computationally expensive to be directly employed in molecular dynamics or Monte Carlo simulations.
Article
We present hierarchical machine learning (hML) of highly accurate potential energy surfaces (PESs). Our scheme is based on adding predictions of multiple Δ-machine learning models trained on energies and energy corrections calculated with a hierarchy of quantum chemical methods. Our (semi-)automatic procedure determines the optimal training set size and composition of each constituent machine learning model, simultaneously minimizing the computational effort necessary to achieve the required accuracy of the hML PES. Machine learning models are built using kernel ridge regression, and training points are selected with structure-based sampling. As an illustrative example, hML is applied to a high-level ab initio CH3Cl PES and is shown to significantly reduce the computational cost of generating the PES by a factor of 100 while retaining similar levels of accuracy (errors of ∼1 cm⁻¹).
Article
The Density-Functional Tight Binding (DFTB) method is a popular semiempirical approximation to Density Functional Theory (DFT). In many cases, DFTB can provide comparable accuracy to DFT at a fraction of the cost, enabling simulations on length- and time-scales that are unfeasible with first principles DFT. At the same time (and in contrast to empirical interatomic potentials and force-fields), DFTB still offers direct access to electronic properties such as the band-structure. These advantages come at the cost of introducing empirical parameters to the method, leading to a reduced transferability compared to true first-principle approaches. Consequently, it would be very useful if the parameter-sets could be routinely adjusted for a given project. While fairly robust and transferable parameterization workflows exist for the electronic structure part of DFTB, the so-called repulsive potential VrepV_{\mathrm{rep}} poses a major challenge. In this paper we propose a machine-learning (ML) approach to fitting VrepV_{\mathrm{rep}}, using Gaussian Process Regression (GPR) \hl{to reconstruct} VrepV_{\mathrm{rep}} \hl{with DFT-DFTB force residues as training data.} The use of GPR circumvents the need for non-linear or global parameter optimization, while at the same time offering arbitrary flexibility in terms of the functional form. We also show that the proposed method can be applied to multiple elements at once, by fitting repulsive potentials for organic molecules containing carbon, hydrogen and oxygen. Overall, the new approach removes focus from the choice of functional form and parameterization procedure, in favour of a data-driven philosophy.
Article
In recent years, promising deep learning based interatomic potential energy surface (PES) models have been proposed that can potentially allow us to perform molecular dynamics simulations for large scale systems with quantum accuracy. However, making these models truly reliable and practically useful is still a very non-trivial task. A key component in this task is the generation of datasets used in model training. In this paper, we introduce the Deep Potential GENerator (DP-GEN), an open-source software platform that implements the recently proposed ”on-the-fly” learning procedure (Zhang et al. 2019) and is capable of generating uniformly accurate deep learning based PES models in a way that minimizes human intervention and the computational cost for data generation and model training. DP-GEN automatically and iteratively performs three steps: exploration, labeling, and training. It supports various popular packages for these three steps: LAMMPS for exploration, Quantum Espresso, VASP, CP2K, etc. for labeling, and DeePMD-kit for training. It also allows automatic job submission and result collection on different types of machines, such as high performance clusters and cloud machines, and is adaptive to different job management tools, including Slurm, PBS, and LSF. As a concrete example, we illustrate the details of the process for generating a general-purpose PES model for Cu using DP-GEN. Program summary Program Title: DP-GEN Program Files doi: http://dx.doi.org/10.17632/sxybkgc5xc.1 Licensing provisions: LGPL Programming language: Python Nature of problem: Generating reliable deep learning based potential energy models with minimal human intervention and computational cost. Solution method: The concurrent learning scheme is implemented. Supports for sampling configuration space with LAMMPS, generating ab initio data with Quantum Espresso, VASP, CP2K and training potential models with DeePMD-kit are provided. Supports for different machines including workstations, high performance clusters and cloud machines are provided. Supports for job management tools including Slurm, PBS, LSF are provided.
Article
We use the PBE0/6-31G* density functional method to perform ab initio quantum mechanical/molecular mechanical (QM/MM) molecular dynamics (MD) simulations under periodic boundary conditions with rigorous electrostatics using the ambient potential composite Ewald method in order to test the convergence of MM→QM/MM free energy corrections for the prediction of 17 small-molecule solvation free energies and 8 ligand binding free energies to T4 lysozyme. The ``indirect'' thermodynamic cycle for calculating free energies is used to explore whether a series of reference potentials improve the statistical quality of the predictions. Specifically, we construct a series of reference potentials that optimizes a molecular mechanical (MM) force field's parameters to reproduce the ab initio QM/MM forces from a QM/MM simulation. The optimizations form a systematic progression of successively expanded parameters that include bond, angle, dihedral and charge parameters. For each reference potential, we calculate benchmark quality reference values for the MM→QM/MM correction by performing the mixed MM and QM/MM Hamiltonians at 11 intermediate states, each for 200 ps. We then compare forward and reverse application of Zwanzig's relation, thermodynamic integration, and Bennett's acceptance ratio (BAR) methods as a function of reference potential, simulation time, and the number of simulated intermediate states. We find that Zwanzig's equation is inadequate unless a large number of intermediate states are explicitly simulated. The TI and BAR mean signed errors are very small even when only the end-state simulations are considered, and the standard deviation of the TI and BAR errors are decreased by choosing a reference potential that optimizes the bond and angle parameters. We find a robust approach for the data sets of fairly rigid molecules considered here is to use bond+angle reference potential together with the end-state-only BAR analysis. This requires a QM/MM simulations to be performed in order to generate reference data to parameterize the bond+angle reference potential, and then this same simulation serves a dual purpose as the full QM/MM end-state. The convergence of the results with respect to time suggests that computational resources may be used more efficiently by running multiple simulations for no more than 50 ps, rather than running one long simulation.
Article
An efficient and accurate reference potential simulation protocol is proposed for producing ab initio quantum mechanical molecular mechanical (AI-QM/MM) quality free energy profiles for chemical reactions in a solvent or macromolecular environment. This protocol involves three stages: (a) using force matching to recalibrate a semi-empirical quantum mechanical (SE-QM) Hamiltonian for the specific reaction under study; (b) employing the recalibrated SE-QM Hamiltonian (in combination with molecular mechanical force fields) as the reference potential to drive umbrella samplings along the reaction pathway; and (c) computing AI-QM/MM energy values for collected configurations from the sampling and performing weighted thermodynamic perturbation to acquire AI-QM/MM corrected reaction free energy profile. For three model reactions (identity SN2 reaction, Menshutkin reaction, and glycine proton transfer reaction) in aqueous solution and one enzyme reaction (Claisen arrangement in chorismate mutase), our simulations using recalibrated PM3 SE-QM Hamiltonians well reproduced AI-QM/MM free energy profiles (at the B3LYP/6-31G* level of theory) all within 1 kcal/mol with a 20 to 45 fold reduction in the computer time.
Article
Free energy sampling methods allow studying the full dynamics of activated processes. Unfortunately, the affordable accuracy of the potential describing the energy and forces of the system is usually rather low. Here we introduce a new method that by combining metadynamics and free energy perturbation allows calculating accurate quantum chemical free energies for chemical reactions. To prove the effectiveness of this new approach we study the SN2 reaction of CH3F + Cl- → CH3Cl + F- in vacuo and solvated by water. Comparisons are made with harmonic transition-state theory to show how this method could provide accurate equilibrium and rate constants for complex systems.
Article
A predictive understanding of the mechanisms of RNA cleavage is important for the design of emerging technology built from biological and synthetic molecules that have promise for new biochemical and medicinal applications. Over the past 15 years, RNA cleavage reactions involving 2'-O-transphosphorylation have been discussed using a simplified framework introduced by Breaker that consists of four fundamental catalytic strategies (designated α, β, γ, and δ) that contribute to rate enhancement. As more detailed mechanistic data emerge, there is need for the framework to evolve and keep pace. We develop an ontology for discussion of strategies of enzymes that catalyze RNA cleavage via 2'-O-transphosphorylation that stratifies Breaker’s framework into primary (1°), secondary (2°) and tertiary (3°) contributions to enable more precise interpretation of mechanism in the context of structure and bonding. Further, we point out instances where atomic-level changes give rise to changes in more than one catalytic contribution, a phenomenon we refer to as ‘functional blurring’. We hope that this ontology will help clarify our conversations and pave the path forward toward a consensus view of these fundamental and fascinating mechanisms. The insight gained will deepen our understanding of RNA cleavage reactions catalyzed by natural protein and RNA enzymes, as well as aid in the design of new engineered DNA and synthetic enzymes.
Article
Free energy profile (FE Profile) is an essential quantity for the estimation of reaction rate and the validation of reaction mechanism. For chemical reactions in condensed phase or enzymatic reactions, the computation of FE profile at ab initio (ai) quantum mechanical/molecular mechanics (QM/MM) level is still far too expensive. Although semiempirical (SE) method can be hundreds or thousands of times faster than the ai methods, the accuracy of SE methods is often unsatisfactory, due to the approximations that have been adopted in these methods. In this work, we proposed a new method termed MBAR+wTP, in which the ai QM/MM free energy profile is computed by a weighted thermodynamic perturbation (TP) correction to the SE profile generated by the multistate Bennett acceptance ratio (MBAR) analysis of the trajectories from umbrella samplings (US). The weight factors used in the TP calculations are a byproduct of the MBAR analysis in the post-processing of the US trajectories, which are often discarded after the free energy calculations. The raw ai QM/MM free energy profile is then smoothed using Gaussian process regression, in which the noise of each datum is set to be inversely proportional to the exponential of the reweighting entropy. The results show that this approach can enhance the efficiency of ai FE profile calculations by several orders of magnitude with only a slight loss of accuracy. This method can significantly enhance the applicability of ai QM/MM methods in the studies of chemical reactions in condensed phase and enzymatic reactions.
Article
Current neural networks for predictions of molecular properties use quantum chemistry only as a source of training data. This paper explores models that use quantum chemistry as an integral part of the prediction process. This is done by implementing self-consistent-charge Density-Functional-Tight-Binding (DFTB) theory as a layer for use in deep learning models. The DFTB layer takes, as input, Hamiltonian matrix elements generated from earlier layers and produces, as output, electronic properties from self-consistent field solutions of the corresponding DFTB Hamiltonian. Backpropagation enables efficient training of the model to target electronic properties. Two types of input to the DFTB layer are explored, splines and feed-forward neural networks. Because overfitting can cause models trained on smaller molecules to perform poorly on larger molecules, regularizations are applied that penalize non-monotonic behavior and deviation of the Hamiltonian matrix elements from those of the published DFTB model used to initialize the model. The approach is evaluated on 15,700 hydrocarbons by comparing the root mean square error in energy and dipole moment, on test molecules with 8 heavy atoms, to the error from the initial DFTB model. When trained on molecules with up to 7 heavy atoms, the spline model reduces the test error in energy by 60% and in dipole moments by 42%. The neural network model performs somewhat better, with error reductions of 67% and 59% respectively. Training on molecules with up to 4 heavy atoms reduces performance, with both the spline and neural net models reducing the test error in energy by about 53% and in dipole by about 25%.
Article
The ability to build arguments and explanations based on scientific models is emphasized in current educational standards as a central science practice that students should develop in their science classes. In chemistry, it is expected that students will be able to apply their understanding in the construction of mechanistic explanations using submicroscopic models of matter. The main goal of this contribution is to highlight and characterize a set of fundamental chemical mechanisms that enable professionals in different fields to build rationales about the properties and behaviors of chemical entities across a wide variety of systems and processes. Thus, they represent the types of understandings to which chemistry educators should aspire for their students to develop and transfer to other domains. These fundamental mechanisms define ways of reasoning that students should master and chemistry instructors must give priority to in their instructional and assessment efforts.
Article
We combine the approximate Density-Functional Tight-Binding (DFTB) method with unsupervised machine learning . This allows to improve transferability and accuracy, make use of large quantum chemical data sets for the parametrization, and to efficiently automatize the parametrization process of DFTB. For this purpose, generalized pair-potentials are introduced, where the chemical environmental is included during the learning process leading to more specific effective two-body potentials. We train on energies and forces of equilibrium and non-equilibrium structures of 2100 molecules, and test on ∼ 130.000 organic molecules containing O, N, C, H and F atoms. Atomization energies of the reference method can be reproduced within an error of ∼ 2.6 kcal/mol, indicating drastic improvement over standard DFTB.
Article
Parametrization of small organic molecules for classical molecular dynamics simulations is not trivial. The vastness of the chemical space makes approaches using building blocks challenging. The most common approach is therefore an invidual parametrization of each compound by deriving partial charges from semi-empirical or ab initio calculations and inheriting the bonded and van der Waals (Lennard-Jones) parameters from a biomolecular force field. The quality of the partial charges generated in this fashion depends on the level of the quantum-chemical calculation as well as on the extraction procedure used. Here, we present a machine learning (ML) based approach for predicting partial charges extracted from density functional theory (DFT) electron densities. The training set was chosen with the goal to provide a broad coverage of the known chemical space of drug-like molecules. In addition to the speed of the approach, the partial charges predicted by ML are not dependent on the three-dimensional conformation in contrast to the ones obtained by fitting to the electrostatic potential (ESP). To assess the quality and compatibility with standard force fields, we performed benchmark calculations for the free energy of hydration and liquid properties such as density and heat of vaporization.
Article
Direct molecular dynamics (MD) simulation with ab initio quantum mechanical and molecular mechanical (QM/MM) methods is very powerful for studying the mechanism of chemical reactions in complex environment but very time consuming. The computational cost on QM/MM calculations during MD simulations can be reduced significantly using semiempirical QM/MM methods with lower accuracy. To achieve higher accuracy at the ab initio QM/MM level, a correction on the existing semiempirical QM/MM model is an attractive way. Recently, we reported a neural network (NN) method as QM/MM-NN to predict the potential energy difference between semiempirical and ab initio QM/MM approaches. The high-level results can be obtained using neural network based on semiempirical QM/MM MD simulations, but the lack of direct MD samplings at the ab initio QM/MM level is still a deficiency that limits the applications of QM/MM-NN. In the present paper, we developed a dynamic scheme of QM/MM-NN for direct MD simulations on the NN-predicted potential energy surface to approximate ab initio QM/MM MD. Since some configurations excluded from the database for NN training were encountered during simulations, which may cause some difficulties on MD samplings, an adaptive procedure inspired by the selection scheme reported by Behler was employed with some adaptions to update NN and carry out MD iteratively. We further applied the adaptive QM/MM-NN MD method to the free energy calculation and transition path optimization on chemical reactions in water. The results at the ab initio QM/MM level can be well reproduced using this method after 2-4 iteration cycles. The saving in computational cost is about 2 orders of magnitude. It demonstrates that the QM/MM-NN with direct MD simulations has great potentials not only for the calculation of thermodynamic properties but also for the characterization of reaction dynamics, which provides a useful tool to study chemical or biochemical systems in solution or enzymes.
Article
There has been a resurgence of interest in free energy methods motivated by the performance enhancements offered by molecular dynamics (MD) software written for specialized hardware, such as graphics processing units (GPUs). In this work, we exploit the properties of a parameter-interpolated thermodynamic integration (PI-TI) method to connect states by their molecular mechanical (MM) parameter values. This pathway is shown to be better behaved for Mg2+^{2+}\rightarrowCa2+^{2+} transformations than traditional linear alchemical pathways (with and without soft-core potentials). The PI-TI method has the practical advantage that no modification of the MD code is required to propagate the dynamics. In the case of AMBER, this enables all the performance benefits of GPU-acceleration to be realized, in addition to unlocking the full spectrum of features available within the MD software, such as Hamiltonian replica exchange (HREM). The TI evaluation can be accomplished efficiently in a post-processing step by reanalyzing the statistically independent trajectory frames in parallel for high throughput. We apply the PI-TI method with HREM on GPUs in AMBER to predict pKaK_a values in double stranded RNA molecules, and make comparison with experiments. Convergence to under 0.25 units for these systems required 100 ns or more of sampling per window, and coupling of windows with HREM. We find that MM charges derived from {\em ab initio} QM/MM fragment calculations improve the agreement between calculation and experiment.
Article
Ab initio quantum mechanics/molecular mechanics (QM/MM) molecular dynamics simulation is a useful tool to calculate thermodynamic properties such as potential of mean force for chemical reactions but intensely time consuming. In this paper, we developed a new method using the internal force correction for low-level semiempirical QM/MM molecular dynamics samplings with a predefined reaction coordinate. As a correction term, the internal force was predicted with a machine learning scheme, which provides a sophisticated force field, and added to the atomic forces on the reaction coordinate related atoms at each integration step. We applied this method to two reactions in aqueous solution and reproduced potentials of mean force at the ab initio QM/MM level. The saving in computational cost is about 2 orders of magnitude. The present work reveals great potentials for machine learning in QM/MM simulations to study complex chemical processes.
Article
The partitioning of solute molecules between immiscible solvents with significantly different polarities is of great importance. The polarization between the solute and solvent molecules plays an essential role in determining the solubility of the solute, which makes computational studies utilizing molecular mechanics (MM) rather difficult. In contrast, quantum mechanics (QM) can provide more reliable predictions. In this work, the partition coefficients of the side chain analogs of some amino acids between water and chloroform were computed. The QM solvation free energies were calculated indirectly via a series of MM states using Multistate Bennett Acceptance Ratio (MBAR) and the MM-to-QM corrections were applied at the two end points using Thermodynamic Perturbation (TP). Previously, it has been shown (Journal of Chemical Theory and Computation, 2016, 12, 499) that this method provides the minimal variance in the results without running QM simulations. However, if there is insufficient overlap in phase space between the MM and QM Hamiltonians, this method fails. In this work, we propose, for the first time, a quantity termed the reweighting entropy that serves as a metric for the reliability of the TP calculations. If the reweighting entropy is below a certain threshold (0.65 for the solvation free energy calculations in this work), this MM-to-QM correction should be avoided and two alternative methods can be employed by either introducing a semi-empirical state or conducting nonequilibrium simulations. However, the results show that the QM methods are not guaranteed to yield better results than the MM methods. Further improvement of the QM methods are imperative, especially the treatment of the van der Waals and the electrostatic interactions between the QM region and the MM region in the first shell. We also propose a scheme for the calculation of the van der Waals parameters for the solute molecules in nonaqueous solvent, which improves the quality of the computed thermodynamic properties. Furthermore, the force field parameters for the sulfur-containing molecules are also optimized.
Article
We investigate the accuracy and transferability of a recently developed high-dimensional neural network (NN) method for calcium fluoride, fitted to a database of ab initio density functional theory (DFT) calculations based on the Perdew-Burke-Ernzerhof (PBE) exchange correlation functional. We call the method charge equilibration via neural network technique (CENT). Although the fitting database contains only clusters (i.e., nonperiodic structures), the NN scheme accurately describes a variety of bulk properties. In contrast to other available empirical methods the CENT potential has a much simpler functional form, nevertheless it correctly reproduces the PBE energetics of various crystalline phases both at ambient and high pressure. Surface energies and structures as well as dynamical properties derived from phonon calculations are also in good agreement with PBE results. Overall, the difference between the values obtained by the CENT potential and the PBE reference values is less than or equal to the difference between the values of local density approximation (LDA) and Born-Mayer-Huggins (BMH) with those calculated by the PBE exchange correlation functional.