Article

Combined QM/MM, Machine Learning Path Integral Approach to Compute Free Energy Profiles and Kinetic Isotope Effects in RNA Cleavage Reactions

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We present a fast, accurate, and robust approach for determination of free energy profiles and kinetic isotope effects for RNA 2'-O-transphosphorylation reactions with inclusion of nuclear quantum effects. We apply a deep potential range correction (DPRc) for combined quantum mechanical/molecular mechanical (QM/MM) simulations of reactions in the condensed phase. The method uses the second-order density-functional tight-binding method (DFTB2) as a fast, approximate base QM model. The DPRc model modifies the DFTB2 QM interactions and applies short-range corrections to the QM/MM interactions to reproduce ab initio DFT (PBE0/6-31G*) QM/MM energies and forces. The DPRc thus enables both QM and QM/MM interactions to be tuned to high accuracy, and the QM/MM corrections are designed to smoothly vanish at a specified cutoff boundary (6 Å in the present work). The computational speed-up afforded by the QM/MM+DPRc model enables free energy profiles to be calculated that include rigorous long-range QM/MM interactions under periodic boundary conditions and nuclear quantum effects through a path integral approach using a new interface between the AMBER and i-PI software. The approach is demonstrated through the calculation of free energy profiles of a native RNA cleavage model reaction and reactions involving thio-substitutions, which are important experimental probes of the mechanism. The DFTB2+DPRc QM/MM free energy surfaces agree very closely with the PBE0/6-31G* QM/MM results, and it is vastly superior to the DFTB2 QM/MM surfaces with and without weighted thermodynamic perturbation corrections. 18O and 34S primary kinetic isotope effects are compared, and the influence of nuclear quantum effects on the free energy profiles is examined.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... An attractive alternative to ab initio QM/MM simulation is the design quantum mechanical force fields 2,16,17 and machine learning models. [18][19][20][21][22][23] Of particular relevance to the current work is the development of QM/MM-∆MLP models, whereby the energies and forces of a fast, approximate QM model are corrected with a machine-learning potential. 20,[24][25][26][27][28][29][30] These models have the potential to offer the computational efficiency needed to address complex chemical mechanisms that require sampling of high-dimensional free energy surfaces, while providing accuracy comparable to high level QM methods. ...
... The expressions for the E i values from the neural network can be found elsewhere. 20,21 The atomic decomposition of DPRc model is readily amenable to parallel calculation. The DPRc contribution to the energy is activated by setting the idprc=1 option within the Sander input &dprc Fortran namelist. ...
... [101][102][103] The current demonstration examines differences between independently-calculated free energy profiles; however, other applications of i-PI can directly evaluate equilibrium and kinetic isotope effects from umbrella sampling using thermodynamic perturbation and frequency factors. 21,104 The main point of this section is to demonstrate that inclusion of nuclear quantum effects Table 3 expresses 23 This is the author's peer reviewed, accepted manuscript. However, the online version of record will be different from this version once it has been copyedited and typeset. ...
Article
Full-text available
We report the development and testing of new integrated cyberinfrastructure for performing free energy simulations with generalized hybrid quantum mechanical/molecular mechanical (QM/MM) and machine learning potentials (MLPs) in Amber. The Sander molecular dynamics program has been extended to leverage fast, density-functional tight-binding models implemented in the DFTB+ and xTB packages, and an interface to the DeePMD-kit software enables the use of MLPs. The software is integrated through application program interfaces that circumvent the need to perform “system calls” and enable the incorporation of long-range Ewald electrostatics into the external software’s self-consistent field procedure. The infrastructure provides access to QM/MM models that may serve as the foundation for QM/MM–ΔMLP potentials, which supplement the semiempirical QM/MM model with a MLP correction trained to reproduce ab initio QM/MM energies and forces. Efficient optimization of minimum free energy pathways is enabled through a new surface-accelerated finite-temperature string method implemented in the FE-ToolKit package. Furthermore, we interfaced Sander with the i-PI software by implementing the socket communication protocol used in the i-PI client–server model. The new interface with i-PI allows for the treatment of nuclear quantum effects with semiempirical QM/MM–ΔMLP models. The modular interoperable software is demonstrated on proton transfer reactions in guanine-thymine mispairs in a B-form deoxyribonucleic acid helix. The current work represents a considerable advance in the development of modular software for performing free energy simulations of chemical reactions that are important in a wide range of applications.
... These systems include metallic materials, 55 non-metallic inorganic materials, [56][57][58][59][60] water, [61][62][63][64][65][66][67][68][69][70][71] organic systems, 10,72 solutions, 52,73-76 gasphase systems, [77][78][79][80] macromolecular systems, 81,82 and interfaces. [83][84][85][86][87] Furthermore, the DeePMD-kit is capable of simulating systems containing almost all Periodic Table elements, 51 operating under a wide range of temperature and pressure, 88 and can handle drug-like molecules, 72,89 ions, 73,76 transition states, 75,77 and excited states. 90 As a result, the DeePMD-kit is a powerful and versatile tool that can be used to simulate a wide range of atomistic systems. ...
... This enables the model to be easily integrated as a mid-ranged correction to the potential energy within molecular simulation software that uses non-bonded lists, i.e., for each atom, a list of other atoms within a fixed cut-off distance (typically 8-12 Å). The trained DPRc model with a 6 Å range-correction was applied to simulate RNA 2 ′ -O-transphosphorylation reactions in solution in long timescales 75 and obtain better free energy estimates with the help of the generalization of the weighted thermodynamic perturbation (gwTP) method. 100 Very recently, Zeng et al. 72 have trained a Δ-MLP correction model called Quantum Deep Potential Interaction (QDπ) for drug-like molecules, including tautomeric forms and protonation states, which was found to be superior to other semiempirical methods and pure MLP models. ...
... 80 Compared to its initial release, 29 DeePMD-kit has evolved significantly, with the current version (v2.2.1) offering an extensive range of features. These include DeepPot-SE, attentionbased, and hybrid descriptors, 10,50,51,53 the ability to fit tensorial properties, 105,106 type embedding, model deviation, 103,107 Deep Potential-Range Correction (DPRc), 52,75 Deep Potential Long Range (DPLR), 53 graphics processing unit (GPU) support for customized operators, 108 model compression, 109 non-von Neumann molecular dynamics (NVNMD), 110 and various usability improvements, such as documentation, compiled binary packages, graphical user interfaces (GUIs), and application programming interfaces (APIs). This article provides an overview of the current major version of the DeePMD-kit, highlighting its features and technical details, presenting a comprehensive procedure for conducting molecular dynamics as a representative application, benchmarking the accuracy and efficiency of different models, and discussing ongoing developments. ...
Article
Full-text available
DeePMD-kit is a powerful open-source software package that facilitates molecular dynamics simulations using machine learning potentials known as Deep Potential (DP) models. This package, which was released in 2017, has been widely used in the fields of physics, chemistry, biology, and material science for studying atomistic systems. The current version of DeePMD-kit offers numerous advanced features, such as DeepPot-SE, attention-based and hybrid descriptors, the ability to fit tensorial properties, type embedding, model deviation, DP-range correction, DP long range, graphics processing unit support for customized operators, model compression, non-von Neumann molecular dynamics, and improved usability, including documentation, compiled binary packages, graphical user interfaces, and application programming interfaces. This article presents an overview of the current major version of the DeePMD-kit package, highlighting its features and technical details. Additionally, this article presents a comprehensive procedure for conducting molecular dynamics as a representative application, benchmarks the accuracy and efficiency of different models, and discusses ongoing developments.
... 12 The expanded ensemble requires one to introduce an averaging procedure to account for the structural variations of both the solute and environment 13 because the KIE estimated from individual samples may span a relatively wide range. 14 Rather than performing an exhaustive set of geometry optimizations for each reactant and transition state structure in the ensemble, one can leverage the configurational averaging provided by centroid path integral molecular dynamics (PIMD) sampling [15][16][17][18][19] and calculate the KIE from a modified form of the BME that introduces anharmonicity and quantum tunneling within Feynman's path integral framework. [20][21][22][23][24] The modified BME has two components: a ratio of imaginary mode vibrational frequencies and a component resulting from the change in activation free energy due to isotopic substitution. ...
... We apply the method to the calculation of FRC and KIE values for a series of six non-enzymatic phosphoryl transfer reactions (Fig. 1) simulated under periodic boundary conditions with explicit solvent. 14 We validate the approach by generating distributions of reference FRC values from normal mode analysis of geometry optimized umbrella sampling configurations. The reference normal mode analysis distributions are compared to the FRC values produced by the new method, and the two approaches are shown to yield consistent results. ...
... When the BME is applied to explicitly-modeled condensedphase environments, one can observe large variations in the predicted KIE values because many plausible reactant and transition state structures can be found. 14,35 For example, when nonenzymatic phosphoryl transfer reactions were optimized from different initial configurations in explicit solvent, the BME led to 2 ′ and 5 ′ KIE values distributed over a wide 3% range with a 1% standard deviation. 14 This observation is not a criticism; a range of values is expected, but it does imply that the confidence in any single result is low, so the result of many optimizations should be averaged. ...
Article
We use the modified Bigeleisen–Mayer equation to compute kinetic isotope effect values for non-enzymatic phosphoryl transfer reactions from classical and path integral molecular dynamics umbrella sampling. The modified form of the Bigeleisen–Mayer equation consists of a ratio of imaginary mode vibrational frequencies and a contribution arising from the isotopic substitution’s effect on the activation free energy, which can be computed from path integral simulation. In the present study, we describe a practical method for estimating the frequency ratio correction directly from umbrella sampling in a manner that does not require normal mode analysis of many geometry optimized structures. Instead, the method relates the frequency ratio to the change in the mass weighted coordinate representation of the minimum free energy path at the transition state induced by isotopic substitution. The method is applied to the calculation of 16/18O and 32/34S primary kinetic isotope effect values for six non-enzymatic phosphoryl transfer reactions. We demonstrate that the results are consistent with the analysis of geometry optimized transition state ensembles using the traditional Bigeleisen–Mayer equation. The method thus presents a new practical tool to enable facile calculation of kinetic isotope effect values for complex chemical reactions in the condensed phase.
... Furthermore, DeePMD-kit is capable of simulating systems containing almost all periodic table elements 42 , operating under a wide range of temperature and pressure, 94 and can handle drug-like molecules, 78,95 ions, 79,82 transition states, 81,83 and excited states. 96 As a result, DeePMD-kit is a powerful and versatile tool that can be used to simulate a wide range of atomistic systems. ...
... Compared to its initial release 19 , DeePMD-kit has evolved significantly, with the current version (v2.2.1) offering an extensive range of features. These include DeepPot-SE, attentionbased, and hybrid descriptors 10,41,42,44 , the ability to fit tensorial properties 97,98 , type embedding, model deviation 99,100 , Deep Potential -Range Correction (DPRc) 43,81 , Deep Potential Long Range (DPLR) 44 , graphics processing unit (GPU) support for customized operators 101 , model compression 102 , non-von Neumann molecular dynamics (NVNMD) 103 , and various usability improvements such as documentation, compiled binary packages, graphical user interfaces (GUI), and application programming interfaces (API). This article provides an overview of the current major additions to the DeePMD-kit, highlighting its features and technical details, benchmarking the accuracy and efficiency of different models, and dis- cussing ongoing developments. ...
... Deep Potential -Range Correction (DPRc) 43,81 was initially designed to correct the potential energy from a fast, linear-scaling low-level semiempirical QM/MM theory to a highlevel ab initio QM/MM theory in a range-correction way to quantitatively correct short and mid-range non-bonded interactions leveraging the non-bonded lists routinely used in molecular dynamics simulations using molecular mechanical force fields such as AMBER. 108 In this way, long-ranged electrostatic interactions can be modeled efficiently using the particle mesh Ewald method 108 or its extensions for multipolar 109,110 and QM/MM 111,112 potentials. ...
Preprint
Full-text available
DeePMD-kit is a powerful open-source software package that facilitates molecular dynamics simulations using machine learning potentials (MLP) known as Deep Potential (DP) models. This package, which was released in 2017, has been widely used in the fields of physics, chemistry, biology, and material science for studying atomistic systems. The current version of DeePMD-kit offers numerous advanced features such as DeepPot-SE, attention-based and hybrid descriptors, the ability to fit tensile properties, type embedding, model deviation, Deep Potential - Range Correction (DPRc), Deep Potential Long Range (DPLR), GPU support for customized operators, model compression, non-von Neumann molecular dynamics (NVNMD), and improved usability, including documentation, compiled binary packages, graphical user interfaces (GUI), and application programming interfaces (API). This article presents an overview of the current major version of the DeePMD-kit package, highlighting its features and technical details. Additionally, the article benchmarks the accuracy and efficiency of different models and discusses ongoing developments.
... al., 2021a), and very recently applied to develop DPRc potentials that closely reproduce ab initio QM/MM-free energy profiles(Giese et al., 2022).Mechanistic pathways from QM/MM + DPRc simulations provide important insights that can be further validated experimentally (Figs. 6.4 and 6.5). ...
... PT = R X5c-P -R X2c-PFIG. 6.4Results for DPRc model trained to ab initio DFT (PBE0/6-31G*) QM/MM data(Giese et al., 2022) for non-enzymatic reactions of a native system (all oxygen) and variants with thio substitutions at XP1 and X3′ positions. ...
... MM + DPRc FES and O2′ and O5′ isotope effects(Giese et al., 2022) as a function of the reaction coordinate for the native reaction inFig. 6.4(d). ...
Chapter
DESCRIPTION Coarse-grained (CG) molecular dynamics simulations of integral membrane proteins have gained wide popularity because they provide a cost-effective but still accurate description of the protein-membrane interactions as a whole and on the role of individual lipidic species. Therefore, they can provide biologically meaningful information at a resolution comparable to those accessible to experimental techniques. However, the simulation of membrane proteins remains a challenging task that requires specific expertise, as external pressures and solvation need to be carefully handled. CG simulations that lump several water molecules into one single supramolecular moiety may present further intricacies due to bulkier solvent representations or model-dependent compressibilities. This chapter provides a detailed protocol for setting up, running, and analyzing CG simulations of membrane proteins using the SIRAH force field for CG simulations within the AMBER package.
... In order to achieve high (ab initio level) accuracy at low computational cost, a new deep potential range correction (DPRc) has been developed that enables short-ranged QM/MM interactions to be tuned for higher accuracy, and the correction smoothly vanishes within a specified cutoff such that it can be easily integrated with molecular mechanical (MM) force fields that utilize a non-bonded list such as those in AMBER. An active learning training procedure has been developed and validated (Zeng Results for DPRc model trained to ab initio DFT (PBE0/6-31G*) QM/MM data (Giese et al., 2022) for non-enzymatic reactions of a native system (all oxygen) and variants with thio substitutions at XP1 and X3′ positions. QM/MM + DPRc FES and O2′ and O5′ isotope effects (Giese et al., 2022) as a function of the reaction coordinate for the native reaction in Fig. 6.4(d). ...
... An active learning training procedure has been developed and validated (Zeng Results for DPRc model trained to ab initio DFT (PBE0/6-31G*) QM/MM data (Giese et al., 2022) for non-enzymatic reactions of a native system (all oxygen) and variants with thio substitutions at XP1 and X3′ positions. QM/MM + DPRc FES and O2′ and O5′ isotope effects (Giese et al., 2022) as a function of the reaction coordinate for the native reaction in Fig. 6.4(d). MD and PIMD QM/MM simulations were performed with SPC/Fw (Wu et al., 2006) and q-SPC/Fw (Paesani et al., 2006) water models, respectively. ...
... Learning DeePMD-kit 6-15 scitation.org/bookset al., 2021a), and very recently applied to develop DPRc potentials that closely reproduce ab initio QM/MM-free energy profiles(Giese et al., 2022).Mechanistic pathways from QM/MM + DPRc simulations provide important insights that can be further validated experimentally (Figs. 6.4 and 6.5). ...
Chapter
DESCRIPTION A new direction has emerged in molecular simulations in recent years, where potential energy surfaces (PES) are constructed using machine learning (ML) methods. These ML models, combining the accuracy of quantum mechanical models and the efficiency of empirical atomic potential models, have been demonstrated by many studies to have extensive application prospects. This chapter introduces a recently developed ML model, Deep Potential (DP), and the corresponding package, DeePMD-kit. First, we present the basic theory of the DP method. Then, we show how to train and test a DP model for a gas-phase methane molecule using the DeePMD-kit package. Next, we introduce some recent progress on simulations of biomolecular processes by integrating the DeePMD-kit with the AMBER molecular simulation software suite. Finally, we provide a supplement on points that require further explanation.
... 46 In the present work, we develop a Quantum Deep-learning Potential Interaction (QDπ) model that uses a fast third-order self-consistent density-functional tight-binding (DFTB3/3OB) model 47,48 that is corrected to a quantitatively high-level of accuracy through a range-corrected deep-learning potential (DPRc). 49,50 In this way, the QDπ model developed here is the form of a quantum mechanical/machine learning potential correction (QM/Δ-MLP). 35,49−54 The use of DFTB3 as a robust QM base model has several important advantages. ...
... The QDπ model is trained to be a QM/Δ-MLP; i.e., a nonelectronic DPRc "correction" to the DFTB3/3OB 75 QM model potential energy similar to previous work. 49,50 2.1.1. Broad Data Sets: ANI-1xm and COMP5m. ...
... The next step of future work will involve developing an intermolecular QM/MM interaction potential as a new rangecorrected deep-learning potential. 49,50 The full (internal and intermolecular interaction) QDπ model is designed to be a correction to the QM/MM potential energy using DFTB3/ 3OB and the latest AMBER FF19SB for proteins, 126 OL3/ OL15 for nucleic acids, 127−129 OPC model for water, 130,131 and 12−6−4 ion models. 132−134 Once the intermolecular interaction component of the QDπ model has been developed and validated in alchemical free energy simulations, 5 next steps will be to extend the chemical space of drug molecules to include P, S, F, and Cl atoms. ...
Article
We report QDπ-v1.0 for modeling the internal energy of drug molecules containing H, C, N, and O atoms. The QDπ model is in the form of a quantum mechanical/machine learning potential correction (QM/Δ-MLP) that uses a fast third-order self-consistent density-functional tight-binding (DFTB3/3OB) model that is corrected to a quantitatively high-level of accuracy through a deep-learning potential (DeepPot-SE). The model has the advantage that it is able to properly treat electrostatic interactions and handle changes in charge/protonation states. The model is trained against reference data computed at the ωB97X/6-31G* level (as in the ANI-1x data set) and compared to several other approximate semiempirical and machine learning potentials (ANI-1x, ANI-2x, DFTB3, MNDO/d, AM1, PM6, GFN1-xTB, and GFN2-xTB). The QDπ model is demonstrated to be accurate for a wide range of intra- and intermolecular interactions (despite its intended use as an internal energy model) and has shown to perform exceptionally well for relative protonation/deprotonation energies and tautomers. An example application to model reactions involved in RNA strand cleavage catalyzed by protein and nucleic acid enzymes illustrates QDπ has average errors less than 0.5 kcal/mol, whereas the other models compared have errors over an order of magnitude greater. Taken together, this makes QDπ highly attractive as a potential force field model for drug discovery.
... The mechanism of closely related non-enzymatic phosphoryl transfer reactions have been explored with linear free energy relationships 80,81 and through the calculation of free energy surfaces. 82,83 These previous works found that the pathway is correlated to the pK a of the leaving group. Leaving groups with a pK a < 11 ("enhanced" leaving groups) proceed through a concerted mechanism containing a single, "early" (ξ PT < 0) transition state, whereas leaving groups with a pK a > 12 ("poor" leaving groups) proceed through two distinct barriers separated by a minimum. ...
... For the purpose of providing a stringent test case for demonstration, the 4 reference potentials were specifically designed such that none of them accurately reproduce the target FES throughout the entire range of ξ PT values. The reference potentials use the MNDO/d semiempirical Hamiltonian supplemented with a range-corrected deep potential 71,83 (DPRc) MLP. We trained 4 ad hoc MNDO/d QM/MM+DPRc potentials using different target data to yield significantly different reference potentials. ...
... The native model reaction (Figure 1b) is used as a case study to emphasize the benefits offered by the reference potential approach. This reaction was one of 6 nonenzymatic models used to parametrize the DFTB2/MIO QM/MM+DPRc potentials in ref. 83 In this section, we introduce our notation by reviewing the MBAR approach for calculating free energy surfaces. The description also serves to aid the reader's understanding of the differences between the MBAR, wTP, and gwTP methods. ...
Article
We describe the generalized weighted thermodynamic perturbation (gwTP) method for estimating the free energy surface of an expensive "high-level" potential energy function from the umbrella sampling performed with multiple inexpensive "low-level" reference potentials. The gwTP method is a generalization of the weighted thermodynamic perturbation (wTP) method developed by Li and co-workers [J. Chem. Theory Comput. 2018, 14, 5583-5596] that uses a single "low-level" reference potential. The gwTP method offers new possibilities in model design whereby the sampling generated from several low-level potentials may be combined (e.g., specific reaction parameter models that might have variable accuracy at different stages of a multistep reaction). The gwTP method is especially well suited for use with machine learning potentials (MLPs) that are trained against computationally expensive ab initio quantum mechanical/molecular mechanical (QM/MM) energies and forces using active learning procedures that naturally produce multiple distinct neural network potentials. Simulations can be performed with greater sampling using the fast MLPs and then corrected to the ab initio level using gwTP. The capabilities of the gwTP method are demonstrated by creating reference potentials based on the MNDO/d and DFTB2/MIO semiempirical models supplemented with the "range-corrected deep potential" (DPRc). The DPRc parameters are trained to ab initio QM/MM data, and the potentials are used to calculate the free energy surface of stepwise mechanisms for nonenzymatic RNA 2'-O-transesterification model reactions. The extended sampling made possible by the reference potentials allows one to identify unequilibrated portions of the simulations that are not always evident from the short time scale commonly used with ab initio QM/MM potentials. We show that the reference potential approach can yield more accurate ab initio free energy predictions than the wTP method or what can be reasonably afforded from explicit ab initio QM/MM sampling.
... In order to ascertain the contribution of nuclear quantum effects, path integral simulations were performed with a recently developed software interface [33,34] between Amber [35] and i-PI [36]. PIMD simulations are roughly an order of magnitude more computationally costly than the classical MD simulations and preclude the practical use of the ab initio QM/MM model. ...
... These models can be parameterized to improve their accuracy, for example, using force matching to higher levels [71][72][73]. Machine learning potentials (MLPs) have shown particular promise in enhancing the accuracy and performance of condensed-phase simulations of chemical reactions [33,[74][75][76][77][78]. Of particular relevance to the current work is the development of QM/MM-∆MLP models, whereby the energies and forces of a fast, approximate QM model are corrected with a machine learning potential [76,[79][80][81][82][83][84][85]. ...
Article
Full-text available
Rare tautomeric forms of nucleobases can lead to Watson–Crick-like (WC-like) mispairs in DNA, but the process of proton transfer is fast and difficult to detect experimentally. NMR studies show evidence for the existence of short-time WC-like guanine–thymine (G-T) mispairs; however, the mechanism of proton transfer and the degree to which nuclear quantum effects play a role are unclear. We use a B-DNA helix exhibiting a wGT mispair as a model system to study tautomerization reactions. We perform ab initio (PBE0/6-31G*) quantum mechanical/molecular mechanical (QM/MM) simulations to examine the free energy surface for tautomerization. We demonstrate that while the ab initio QM/MM simulations are accurate, considerable sampling is required to achieve high precision in the free energy barriers. To address this problem, we develop a QM/MM machine learning potential correction (QM/MM-ΔMLP) that is able to improve the computational efficiency, greatly extend the accessible time scales of the simulations, and enable practical application of path integral molecular dynamics to examine nuclear quantum effects. We find that the inclusion of nuclear quantum effects has only a modest effect on the mechanistic pathway but leads to a considerable lowering of the free energy barrier for the GT*⇌G*T equilibrium. Our results enable a rationalization of observed experimental data and the prediction of populations of rare tautomeric forms of nucleobases and rates of their interconversion in B-DNA.
... The first route is to input descriptors of the MM environment to the neural network to learn the polarization effects. A straightforward strategy is to allow the models to incorporate atom-wise features from MM atoms within the cutoff distance of the ML atoms, enabling them to learn semi-local electrostatic effects (36,37). However, this strategy introduces a challenge: the dimension of features dramatically rises with the large number of surrounding MM atoms and makes it difficult to adequately sample the physical space. ...
Preprint
Machine learning force fields offer the ability to simulate biomolecules with quantum mechanical accuracy while significantly reducing computational costs, attracting growing attention in biophysics. Meanwhile, leveraging the efficiency of molecular mechanics in modeling solvent molecules and long-range interactions, a hybrid machine learning/molecular mechanics (ML/MM) model offers a more realistic approach to describing complex biomolecular systems in solution. However, multiscale models with electrostatic embedding require accounting for the polarization of the ML region induced by the MM environment. To address this, we adapt the state-of-the-art NequIP architecture into a polarizable machine learning force field, NepoIP, enabling the modeling of polarization effects based on the external electrostatic potential. We found that the nanosecond MD simulations based on NepoIP/MM are stable for the periodic solvated dipeptide system and the converged sampling shows excellent agreement with the reference QM/MM level. Moreover, we show that a single NepoIP model can be transferable across different MM force fields, as well as extremely different MM environment of water and proteins, laying the foundation for developing a general machine learning biomolecular force field to be used in ML/MM with electrostatic embedding.
... [34][35][36][37][38][39][40] Path-integral-based QM/MM methods have been successfully applied to study proton transfer, 41 hydride transfer, 42 and RNA cleavage reactions. 43 Although a major limitation of pathintegral based methods is the high computational cost, especially if ab initio potential energy surfaces for the QM part are to be used, recent developments have been able to accelerate the calculations and reduce their computational costs. [44][45][46][47][48] Another promising approach is the nuclear-electronic orbital (NEO) method, which employs multicomponent wave functions to simultaneously describe the quantum behavior of both nuclei and electrons. ...
Article
Full-text available
The hybrid quantum mechanics/molecular mechanics (QM/MM) approach, which combines the accuracy of QM methods with the efficiency of MM methods, is widely used in the study of complex systems. However, past QM/MM implementations often neglect or face challenges in addressing nuclear quantum effects, despite their crucial role in many key chemical and biological processes. Recently, our group developed the constrained nuclear-electronic orbital (CNEO) theory, a cost-efficient approach that accurately addresses nuclear quantum effects, especially quantum nuclear delocalization effects. In this work, we integrate CNEO with the QM/MM approach through the electrostatic embedding scheme and apply the resulting CNEO QM/MM to two hydrogen-bonded complexes. We find that both solvation effects and nuclear quantum effects significantly impact hydrogen bond structures and dynamics. Notably, in the glutamic acid–glutamate complex, which mimics a common low barrier hydrogen bond in biological systems, CNEO QM/MM accurately predicts nearly equal proton sharing between the two residues. With an accurate description of both quantum nuclear delocalization effects and environmental effects, CNEO QM/MM is a promising new approach for simulating complex chemical and biological systems.
... MM simulations have been reported to lack accuracy and precision in describing bond formation and breaking, which is essential in drug discovery [186]. Modelling large systems using QM methods has been an enduring scientific challenge, prompting the need to combine MM and QM methods in the QM/MM model [187][188][189][190]. Accordingly, MM and QM calculations are invaluable in studying long-range electrostatic interactions affecting biological macromolecules' electronic structures [190]. ...
Article
Full-text available
Computer-aided drug design and discovery methods have been essential in developing small molecules with therapeutic properties over the last decades. Application of computational resources includes drug target identification, hit discovery, and lead optimization. Accordingly, with tremendous research efforts and the availability of financial support from government agencies across the world, and multinational drug companies, the overall research level in this area will continue to advance. The methodology used in this review paper entailed a thorough examination of research studies on relevant literature on drug design and development using computational resources. Extensive searches using Scopus, International Pharmaceutical Abstracts (OvidSp, WHO Global Health Library, Cochrane, Google Scholar, Web of Science, Science Direct, ProQuest dissertation & theses, Worldwide Political Science Abstracts (CSA), and PubMed was carried out. A standardized template was used to ensure that the selected papers met the inclusion criteria, and relevant to the review. Ultimately, there are robust technologies developed to enhance the drug discovery process. Therefore, this review provides insights into computational resources in Silico and ab initio methods and algorithms, not restricted to drug metabolism predictions for drug design, and the practical applications of artificial intelligence (AI) in drug discovery. Computational tools and methods for drug design and development such as molecular dynamics (MD), molecular docking, quantum mechanics (QM), hybrid quantum mechanics/molecular mechanics (QM/MM), and Density functional theory (DFT) have been reviewed. Accordingly, the emerging technique of synergistically employing these techniques influences the fundamental challenges of conventional medicines for complex diseases. Herein, we discuss ligand-based and structure-based drug discoveries, force field models in MD simulations, docking algorithms, subtractive and additive QM/MM coupling. Nonetheless, as computer-aided drug (CADD) approaches continue to evolve with significant improvements, the focus areas will be on docking and virtual screening, scoring functions, optimization of hits, and assessment of adsorption, distribution, metabolism, excretion, and toxicity (ADMET) properties. With the current success, the present computational resources will aid in the future discovery of novel compounds with high therapeutic performance. The ongoing oncology research efforts will also significantly contribute to UN sustainable development goals – good health and well-being, sustainable innovation and industrialization.
... The holy grail of molecular simulations, the development of accurate yet computational efficient energy functions, could be at hand thanks to the development of machine learning potentials (MLPs) 66−68 and the combination of them with a molecular mechanics description for larger parts of the systems under study (ML/ MM approaches). 69,70 A correct description of the electronic density of the reactive subsystems and its electrostatic interactions 70 ...
... One path forward that appears promising is to use machine-learning potentials (MLPs) either as stand-alone alternative models [39][40][41][42][43][44] , or else to augment existing semiempirical QM methods. [45][46][47][48][49][50][51] We will refer to the former class as "pure MLPs" and the latter class as "QM/∆-MLPs". MLPs have emerged as powerful tools to enable fast and accurate chemical models within the scope of their training 39,[41][42][43][44] . ...
Article
Full-text available
Modern semiempirical electronic structure methods have considerable promise in drug discovery as universal "force fields" that can reliably model biological and drug-like molecules. Herein, we compare the performance of several NDDO-based semiempirical (MNDO/d, AM1, PM6 and ODM2), density-functional tight-binding based (DFTB3, GFN1-xTB and GFN2-xTB) models with pure machine learning potentials (ANI-1x and ANI-2x) and hybrid quantum mechanical/machine learning potentials (AIQM1 and QDπ) for a wide range of data computed at a consistent ωB97X/6-31G* level of theory (as in the ANI-1x database). This data includes conformational energies, intermolecular interactions, tautomers, and protonation states. Additional comparisons are made to a set of natural and synthetic nucleic acids from the artificially expanded genetic information system (AEGIS). This dataset has important implications in the design of new biotechnology and therapeutics. Finally, weexamine acid/base chemistry relevant for RNA cleavage reactions catalyzed by small nucleolytic ribozymes and ribonucleases. Overall, the recently developed QDπ model performs exceptionally well across all datasets, having especially high accuracy for tautomers and protonation states relevant to drug discovery.
... The method was recently applied to estimate free energy barriers and kinetic isotope effects in RNA cleavage reactions. 11 An example of a more general approach aimed at predicting a range of response properties is FieldSchNet, proposed by Gastegger et al. 12 In FieldSchNet, the description of the environment (such as the electric field caused by the MM point charges on each QM atom) is incorporated as an additional input in the NN architecture together with a physically motivated transformation (such as a dipole-field interaction tensor) added as an additional layer. The same philosophy but with a different network architecture was employed by Pan et al., 13 with the MM environment being represented by the generated electrostatic potential and field on QM atoms. ...
Article
Full-text available
This work presents a variant of an electrostatic embedding scheme that allows the embedding of arbitrary machine learned potentials trained on molecular systems in vacuo. The scheme is based on physically motivated models of electronic density and polarizability, resulting in a generic model without relying on an exhaustive training set. The scheme only requires in vacuo single point QM calculations to provide training densities and molecular dipolar polarizabilities. As an example, the scheme is applied to create an embedding model for the QM7 data set using Gaussian Process Regression with only 445 reference atomic environments. The model was tested on the SARS-CoV-2 protease complex with PF-00835231, resulting in a predicted embedding energy RMSE of 2 kcal/mol, compared to explicit DFT/MM calculations.
... Such methods are emerging, but appear to remain somewhat limited so far, for instance, incorporating the MM region directly in an NN potential has been applied to simple reactions, 21 or the QM-MM interaction has been corrected in a short range. 22 Since reliable ΔMLs of the QM-MM interactions for realistic systems, akin to the subject of this work, seem far from straightforward and in need of further developments, this work will use ML for the QM region and describe the QM-MM interactions on the original DFTB/MM level. ...
Article
Full-text available
Glutaredoxins are small enzymes that catalyze the oxidation and reduction of protein disulfide bonds by the thiol-disulfide exchange mechanism. They have either one or two cysteines in their active site, resulting in different catalytic reaction cycles that have been investigated in many experimental studies. However, the exact mechanisms are not yet fully known, and to our knowledge, no theoretical studies have been performed to elucidate the underlying mechanism. In this study, we investigated a proposed mechanism for the reduction of the disulfide bond in the protein HMA4n by a mutated monothiol Homo sapiens glutaredoxin (HsGrx1) and the co-substrate glutathione (GSH). The catalytic cycle involves three successive thiol-disulfide exchanges that occur between the molecules. To estimate the regioselectivity of the different attacks,classical molecular dynamics simulations were performed and the trajectories analyzed regarding the sulfur--sulfur distances and the attack angles between the sulfurs. The free energy profile of each reaction was obtained with hybrid quantum mechanical/molecular mechanical metadynamics simulations. Since this required extensive phase space sampling, the semi-empirical density functional tight-binding (DFTB) method was used to describe the reactive cysteines. For an accurate description, we used specific reaction parameters fitted to B3LYP energies of the thiol-disulfide exchange and a machine learned energy correction that was trained on CCSD(T) energies of thiol-disulfide exchanges. Our calculations show the same regiospecifity as observed in the experiment and the obtained barrier heights are about 12 and 20~kcal/mol for the different reaction steps, which confirms the proposed pathway.
... The method was recently applied to estimate free energy barriers and kinetic isotope effects in RNA cleavage reactions. 11 An example of a more general approach aimed at predicting a range of response properties is FieldSchNet, proposed by Gastegger et al. 12 In FieldSchNet, the description of the environment (such as the electric field caused by the MM point charges on each QM atom) is incorporated as an additional input in the NN architecture together with a physicallymotivated transformation (such as dipole-field interaction tensor) added as an additional layer. Same philosophy, but with a different network architecture was employed by Pan et al., 13 with MM environment being represented by the generated electrostatic potential and field on QM atoms. ...
Preprint
Full-text available
This work presents a variant of an electrostatic embedding scheme that allows the embedding of arbitrary machine learned potentials trained on molecular systems in vacuo. The scheme is based on physically motivated models of electronic density and polarizability, resulting in a generic model without relying on an exhaustive training set. The scheme only requires in vacuo single point QM calculations to provide training densities and molecular dipolar polarizabilities. As an example, the scheme is applied to create an embedding model for the QM7 dataset using Gaussian Process Regression with only 445 reference atomic environments. The model was tested on SARS-CoV-2 protease complex with PF-00835231, resulting in predicted embedding energy RMSE of 2 kcal/mol, compared to explicit DFT/MM calculations.
... The method was recently applied to estimate free energy barriers and kinetic isotope effects in RNA cleavage reactions. 11 An example of a more general approach aimed at predicting a range of response properties is FieldSchNet, proposed by Gastegger et al. 12 In FieldSchNet, the description of the environment (such as the electric field caused by the MM point charges on each QM atom) is incorporated as an additional input in the NN architecture together with a physicallymotivated transformation (such as dipole-field interaction tensor) added as an additional layer. Same philosophy, but with a different network architecture was employed by Pan et al., 13 with MM environment being represented by the generated electrostatic potential and field on QM atoms. ...
Preprint
Full-text available
This work presents a variant of an electrostatic embedding scheme that allows the embedding of arbitrary machine learned potentials trained on molecular systems in vacuo. The scheme is based on physically motivated models of electronic density and polarizability, resulting in a generic model without relying on an exhaustive training set. The scheme only requires in vacuo single point QM calculations to provide training densities and molecular dipolar polarizabilities. As an example, the scheme is applied to create an embedding model for the QM7 dataset using Gaussian Process Regression with only 445 reference atomic environments. The model was tested on SARS-CoV-2 protease complex with PF-00835231, resulting in predicted embedding energy RMSE of 2 kcal/mol, compared to explicit DFT/MM calculations.
... The method was recently applied to estimate free energy barriers and kinetic isotope effects in RNA cleavage reactions. 11 An example of a more general approach aimed at predicting a range of response properties is FieldSchNet, proposed by Gastegger et al. 12 In FieldSchNet, the description of the environment (such as the electric field caused by the MM point charges on each QM atom) is incorporated as an additional input in the NN architecture together with a physicallymotivated transformation (such as dipole-field interaction tensor) added as an additional layer. Same philosophy, but with a different network architecture was employed by Pan et al., 13 with MM environment being represented by the generated electrostatic potential and field on QM atoms. ...
Preprint
Full-text available
This work presents a variant of an electrostatic embedding scheme that allows the embedding of arbitrary machine learned potentials trained on molecular systems in vacuo. The scheme is based on physically motivated models of electronic density and polarizability, resulting in a generic model without relying on an exhaustive training set. The scheme only requires in vacuo single point QM calculations to provide training densities and molecular dipolar polarizabilities. As an example, the scheme is applied to create an embedding model for the QM7 dataset using Gaussian Process Regression with only 445 reference atomic environments. The model was tested on SARS-CoV-2 protease complex with PF-00835231, resulting in predicted embedding energy RMSE of 2 kcal/mol, compared to explicit DFT/MM calculations.
Article
In double‐stranded DNA, a rapid deprotonation of guanine radical cation (G •+ ) hinders the long‐distance transfer of positive charge (hole). It is significant to explore the proton transfer of G •+ for designing other DNA structures with high electrical conductivity. The deprotonation of G •+ is explored in the 1H 2 O, 2H 2 O, 3H 2 O, and 9H 2 O models by quantum mechanics (QM) method. The results indicate that the second hydration shell facilitates proton transfer. The QM/molecular mechanics (MM) (ABEEM) method accurately simulates polarization and charge transfer effects through the implementation of the reactive valence‐state electronegativity piecewise functions and setting local charge conservation conditions. The QM/MM(ABEEM) method has been developed to investigate the 9H 2 O model. The obtained activation energy (16.3 ± 0.8 kJ/mol) through molecular dynamics simulations is consistent with experimental data (15.1 ± 1.5 kJ/mol), demonstrating the accuracy of the QM/MM(ABEEM) method in simulating proton transfer in the DNA system. The deprotonation rate of G •+ in the free base (1.5 × 10 ⁷ s ⁻¹ ) is faster than that of G •+ within double‐stranded DNA (10 ⁶ –10 ⁷ s ⁻¹ ), which indicates that the free G base is an avoidable participant when designing hole transfer carrier due to its rapid deprotonation rate. Concurrently, the relationship between the proton transfer distance and potential barrier is monotone increasing, meaning that the long‐range proton transfer corresponds to high energy barrier. The molecule involved in long‐range proton transfer of G •+ is more suitable as DNA electronic devices. This research provides valuable microscopic insight into deprotonation to advance the advancement of DNA structures with high electrical conductivity.
Article
In this work, an atomistic-scale investigation of the phosphodiester P–O bond cleavage reaction by the enzyme ribonuclease A was carried out by computer simulation techniques. It is shown that during...
Article
Full-text available
Understanding enzyme mechanisms is essential for unraveling the complex molecular machinery of life. In this review, we survey the field of computational enzymology, highlighting key principles governing enzyme mechanisms and discussing ongoing challenges and promising advances. Over the years, computer simulations have become indispensable in the study of enzyme mechanisms, with the integration of experimental and computational exploration now established as a holistic approach to gain deep insights into enzymatic catalysis. Numerous studies have demonstrated the power of computer simulations in characterizing reaction pathways, transition states, substrate selectivity, product distribution, and dynamic conformational changes for various enzymes. Nevertheless, significant challenges remain in investigating the mechanisms of complex multistep reactions, large-scale conformational changes, and allosteric regulation. Beyond mechanistic studies, computational enzyme modeling has emerged as an essential tool for computer-aided enzyme design and the rational discovery of covalent drugs for targeted therapies. Overall, enzyme design/engineering and covalent drug development can greatly benefit from our understanding of the detailed mechanisms of enzymes, such as protein dynamics, entropy contributions, and allostery, as revealed by computational studies. Such a convergence of different research approaches is expected to continue, creating synergies in enzyme research. This review, by outlining the ever-expanding field of enzyme research, aims to provide guidance for future research directions and facilitate new developments in this important and evolving field.
Chapter
We highlight the role played by molecular modeling and simulation to unravel, at an atomistic and even electronic scale, the complex structural dynamic equilibrium assumed by RNA sequences, either cellular or viral. After pointing out the role played by specific RNA structures in regulating key biological functions, or in assuring either viral replication or immune system activation, we will show how computationally efficient multiscale approaches lead to the understanding of the fundamental biological processes, in terms of nucleic acid structural dynamic and their interaction with protein partners, and hence may be successfully used to rationally develop novel therapeutic approaches. By a selection of examples involving cellular and viral RNA, we will show that molecular modeling and simulation is nowadays assuming its role of a virtual microscope complementing and deepening the insight gained by structural and cellular biology.
Article
Full-text available
The inherent discontinuity and unique dimensional attributes of nanomaterial surfaces and interfaces bestow them with various exceptional properties. These properties, however, also introduce difficulties for both experimental and computational studies. The advent of machine learning interatomic potential (MLIP) addresses some of the limitations associated with empirical force fields, presenting a valuable avenue for accurate simulations of these surfaces/interfaces of nanomaterials. Central to this approach is the idea of capturing the relationship between system configuration and potential energy, leveraging the proficiency of machine learning (ML) to precisely approximate high‐dimensional functions. This review offers an in‐depth examination of MLIP principles and their execution and elaborates on their applications in the realm of nanomaterial surface and interface systems. The prevailing challenges faced by this potent methodology are also discussed.
Article
In silico investigations of enzymatic reactions and chemical reactions in condensed phases often suffer from formidable computational costs due to a large number of degrees of freedom and enormous important volume in phase space. Usually, accuracy must be compromised to trade for efficiency by lowering the reliability of the Hamiltonians employed or reducing the sampling time. Reference-potential methods (RPMs) offer an alternative approach to reaching high accuracy of simulation without much loss of efficiency. In this Perspective, we summarize the idea of RPMs and showcase some recent applications. Most importantly, the pitfalls of these methods are also discussed, and remedies to these pitfalls are presented.
Article
We present a comparative study that evaluates the performance of a machine learning potential (ANI-2x), a conventional force field (GAFF), and an optimally tuned GAFF-like force field in the modeling of a set of 10 γ-fluorohydrins that exhibit a complex interplay between intra- and intermolecular interactions in determining conformer stability. To benchmark the performance of each molecular model, we evaluated their energetic, geometric, and sampling accuracies relative to quantum-mechanical data. This benchmark involved conformational analysis both in the gas phase and chloroform solution. We also assessed the performance of the aforementioned molecular models in estimating nuclear spin-spin coupling constants by comparing their predictions to experimental data available in chloroform. The results and discussion presented in this study demonstrate that ANI-2x tends to predict stronger-than-expected hydrogen bonding and overstabilize global minima and shows problems related to inadequate description of dispersion interactions. Furthermore, while ANI-2x is a viable model for modeling in the gas phase, conventional force fields still play an important role, especially for condensed-phase simulations. Overall, this study highlights the strengths and weaknesses of each model, providing guidelines for the use and future development of force fields and machine learning potentials.
Article
Multiscale modeling has a long history of use in structural biology, as computational biologists strive to overcome the time- and length-scale limits of atomistic molecular dynamics. Contemporary machine learning techniques, such as deep learning, have promoted advances in virtually every field of science and engineering and are revitalizing the traditional notions of multiscale modeling. Deep learning has found success in various approaches for distilling information from fine-scale models, such as building surrogate models and guiding the development of coarse-grained potentials. However, perhaps its most powerful use in multiscale modeling is in defining latent spaces that enable efficient exploration of conformational space. This confluence of machine learning and multiscale simulation with modern high-performance computing promises a new era of discovery and innovation in structural biology.
Article
Full-text available
Combustion is a complex chemical system which involves thousands of chemical reactions and generates hundreds of molecular species and radicals during the process. In this work, a neural network-based molecular dynamics (MD) simulation is carried out to simulate the benchmark combustion of methane. During MD simulation, detailed reaction processes leading to the creation of specific molecular species including various intermediate radicals and the products are intimately revealed and characterized. Overall, a total of 798 different chemical reactions were recorded and some new chemical reaction pathways were discovered. We believe that the present work heralds the dawn of a new era in which neural network-based reactive MD simulation can be practically applied to simulating important complex reaction systems at ab initio level, which provides atomic-level understanding of chemical reaction processes as well as discovery of new reaction pathways at an unprecedented level of detail beyond what laboratory experiments could accomplish.
Article
Full-text available
The Varkud satellite ribozyme catalyses site-specific RNA cleavage and ligation, and serves as an important model system to understand RNA catalysis. Here, we combine stereospecific phosphorothioate substitution, precision nucleobase mutation and linear free-energy relationship measurements with molecular dynamics, molecular solvation theory and ab initio quantum mechanical/molecular mechanical free-energy simulations to gain insight into the catalysis. Through this confluence of theory and experiment, we unify the existing body of structural and functional data to unveil the catalytic mechanism in unprecedented detail, including the degree of proton transfer in the transition state. Further, we provide evidence for a critical Mg2+ in the active site that interacts with the scissile phosphate and anchors the general base guanine in position for nucleophile activation. This novel role for Mg2+ adds to the diversity of known catalytic RNA strategies and unifies functional features observed in the Varkud satellite, hairpin and hammerhead ribozyme classes.
Article
Full-text available
ion from ethanol by atomic hydrogen in aqueous solution is studied using two theoretical approaches; multi-path variational transition state theory (MP-VTST) and a path integral formalism in combination with free-energy perturbation and umbrella sampling (PI-FEP/UM). The performance of the models is compared to experimental values of H kinetic isotope effects (KIE). Solvation models used in this study ranged from purely implicit, via mixed – microsolvation treated quantum mechanically via density functional theory (DFT) to fully explicit representation of the solvent, which was incorporated using a combined quantum mechanical-molecular mechanical (QM/MM) potential. The effects of the transition state conformation and the position of microsolvating water molecules interacting with the solute on the KIE are discussed. The KIEs are in good agreement with experiment when MP-VTST is used together with a model that includes microsolvation of the polar part of ethanol by five or six water molecules, emphasizing the importance of explicit solvation in KIE calculations. Both, MP-VTST and PI-FEP/UM enable detailed characterization of nuclear quantum effects accompanying the hydrogen atom transfer reaction in aqueous solution.
Article
Full-text available
Recent developments in many-body potential energy representation via deep learning have brought new hopes to addressing the accuracy-versus-efficiency dilemma in molecular simulations. Here we describe DeePMD-kit, a package written in Python/C++ that has been designed to minimize the effort required to build deep learning based representation of potential energy and force field and to perform molecular dynamics. Potential applications of DeePMD-kit span from finite molecules to extended systems and from metallic systems to chemically bonded systems. DeePMD-kit is interfaced with TensorFlow, one of the most popular deep learning frameworks, making the training process highly automatic and efficient. On the other end, DeePMD-kit is interfaced with high-performance classical molecular dynamics and quantum (path-integral) molecular dynamics packages, i.e., LAMMPS and the i-PI, respectively. Thus, upon training, the potential energy and force field models can be used to perform efficient molecular simulations for different purposes. As an example of the many potential applications of the package, we use DeePMD-kit to learn the interatomic potential energy and forces of a water model using data obtained from density functional theory. We demonstrate that the resulted molecular dynamics model reproduces accurately the structural information contained in the original model.
Article
Full-text available
TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.
Article
Full-text available
RNA cleavage transesterification is of fundamental reaction in biology that is catalyzed by both protein and RNA enzymes. In this work, a series of RNA transesterification model reactions with a wide range of leaving groups are investigated with density-functional calculations in an aqueous solvation environment in order to study linear free energy relationships (LFERs) and their connection to transition state structure and bonding. Overall, results obtained from the polarizable continuum solvation model with UAKS radii produce the best linear correlations and closest overall agreement with experimental results. Reactions with a poor leaving group are predicted to proceed via a stepwise mechanism with a late transition state that is rate controlling. As leaving group becomes more acidic and labile, the barriers of both early and late transition states decrease. LFERs for each transition state are computed, with the late transition state barrier showing greater sensitivity to leaving group pKa. For sufficiently enhanced leaving groups, the reaction mechanism transits to a concerted mechanism characterized by a single early transition state. Further linear relationships were derived for bond lengths and bond orders as a function of leaving group pKa and rate constant values that can be used for prediction. This work provides important benchmark linear free energy data that allows a molecular-level characterization of the structure and bonding of the transition states for this important class of phosphoryl transfer reactions. The relations reported herein can be used to aid in the interpretation of data obtained from experimental studies of non-catalytic and catalytic mechanisms.
Article
Full-text available
Enzymes function by stabilizing reaction transition states; therefore, comparison of the transition states of enzymatic and nonenzymatic model reactions can provide insight into biological catalysis. Catalysis of RNA 2'-O-transphosphorylation by ribonuclease A is proposed to involve electrostatic stabilization and acid/base catalysis, although the structure of the rate-limiting transition state is uncertain. Here, we describe coordinated kinetic isotope effect (KIE) analyses, molecular dynamics simulations, and quantum mechanical calculations to model the transition state and mechanism of RNase A. Comparison of the (18)O KIEs on the 2'O nucleophile, 5'O leaving group, and nonbridging phosphoryl oxygens for RNase A to values observed for hydronium- or hydroxide-catalyzed reactions indicate a late anionic transition state. Molecular dynamics simulations using an anionic phosphorane transition state mimic suggest that H-bonding by protonated His12 and Lys41 stabilizes the transition state by neutralizing the negative charge on the nonbridging phosphoryl oxygens. Quantum mechanical calculations consistent with the experimental KIEs indicate that expulsion of the 5'O remains an integral feature of the rate-limiting step both on and off the enzyme. Electrostatic interactions with positively charged amino acid site chains (His12/Lys41), together with proton transfer from His119, render departure of the 5'O less advanced compared with the solution reaction and stabilize charge buildup in the transition state. The ability to obtain a chemically detailed description of 2'-O-transphosphorylation transition states provides an opportunity to advance our understanding of biological catalysis significantly by determining how the catalytic modes and active site environments of phosphoryl transferases influence transition state structure.
Article
Semiempirical methods like density functional tight-binding (DFTB) allow extensive phase space sampling, making it possible to generate free energy surfaces of complex reactions in condensed-phase environments. Such a high efficiency often comes at the cost of reduced accuracy, which may be improved by developing a specific reaction parametrization (SRP) for the particular molecular system. Thiol-disulfide exchange is a nucleophilic substitution reaction that occurs in a large class of proteins. Its proper description requires a high-level ab initio method, while DFT-GAA and hybrid functionals were shown to be inadequate, and so is DFTB due to its DFT-GGA descent. We develop an SRP for thiol-disulfide exchange based on an artificial neural network (ANN) implementation in the DFTB+ software and compare its performance to that of a standard SRP approach applied to DFTB. As an application, we use both new DFTB-SRP as components of a QM/MM scheme to investigate thiol-disulfide exchange in two molecular complexes: a solvated model system and a blood protein. Demonstrating the strengths of the methodology, highly accurate free energy surfaces are generated at a low cost, as the augmentation of DFTB with an ANN only adds a small computational overhead.
Article
We redevelop the variational free energy profile (vFEP) method using a cardinal B-spline basis to extend the method for analyzing free energy surfaces (FESs) involving three or more reaction coordinates. We also implemented software for evaluating high-dimensional profiles based on the multistate Bennett acceptance ratio (MBAR) method which constructs an unbiased probability density from global reweighting of the observed samples. The MBAR method takes advantage of a fast algorithm for solving the unbinned weighted histogram (UWHAM)/MBAR equations which replaces the solution of simultaneous equations with a nonlinear optimization of a convex function. We make use of cardinal B-splines and multiquadric radial basis functions to obtain smooth, differentiable MBAR profiles in arbitrary high dimensions. The cardinal B-spline vFEP and MBAR methods are compared using three example systems that examine 1D, 2D, and 3D profiles. Both methods are found to be useful and produce nearly indistinguishable results. The vFEP method is found to be 150 times faster than MBAR when applied to periodic 2D profiles, but the MBAR method is 4.5 times faster than vFEP when evaluating unbounded 3D profiles. In agreement with previous comparisons, we find the vFEP method produces superior FESs when the overlap between umbrella window simulations decreases. Finally, the associative reaction mechanism of hammerhead ribozyme is characterized using 3D, 4D, and 6D profiles, and the higher-dimensional profiles are found to have smaller reaction barriers by as much as 1.5 kcal/mol. The methods presented here have been implemented into the FE-ToolKit software package along with new methods for network-wide free energy analysis in drug discovery.
Article
The study of chemical reactions in aqueous media is very important for its implications in several fields of science, from biology to industrial processes. However, modeling these reactions is difficult when water directly participates in the reaction, since it requires a fully quantum mechanical description of the system. Ab-initio molecular dynamics is the ideal candidate to shed light on these processes. However, its scope is limited by a high computational cost. A popular alternative is to perform molecular dynamics simulations powered by machine learning potentials, trained on an extensive set of quantum mechanical calculations. Doing so reliably for reactive processes is difficult because it requires including very many intermediate and transition state configurations. In this study we used an active learning procedure accelerated by enhanced sampling to harvest such structures and to build a neural-network potential to study the urea decomposition process in water. This allowed us to obtain the free energy profiles of this important reaction in a wide range of temperatures, to discover several novel metastable states, and improve the accuracy of the kinetic rates calculations. Furthermore, we found that the formation of the zwitterionic intermediate has the same probability of occurring via an acidic or a basic pathway, which could be the cause of the insensitivity of reaction rates to the solution pH.
Article
On the surface The uptake and hydrolysis of N 2 O 5 from the atmosphere by aqueous aerosols was long thought to occur by solvation and subsequent hydrolysis in the bulk of the aerosol. However, this mechanistic hypothesis was unverifiable because of the fast reaction kinetics. Galib et al. used molecular simulations to show instead that the mechanism is the inverse: Interfacial hydrolysis is followed by solvation into the interior. Their reactive uptake model is consistent with some existing experimental observations. Science , this issue p. 921
Article
In a previous work [Pan et al., Molecules 23, 2500 (2018)], a charge projection scheme was reported, where outer molecular mechanical (MM) charges [>10 Å from the quantum mechanical (QM) region] were projected onto the electrostatic potential (ESP) grid of the QM region to accurately and efficiently capture long-range electrostatics in ab initio QM/MM calculations. Here, a further simplification to the model is proposed, where the outer MM charges are projected onto inner MM atom positions (instead of ESP grid positions). This enables a representation of the long-range MM electrostatic potential via augmentary charges (AC) on inner MM atoms. Combined with the long-range electrostatic correction function from Cisneros et al. [J. Chem. Phys. 143, 044103 (2015)] to smoothly switch between inner and outer MM regions, this new QM/MM-AC electrostatic model yields accurate and continuous ab initio QM/MM electrostatic energies with a 10 Å cutoff between inner and outer MM regions. This model enables efficient QM/MM cluster calculations with a large number of MM atoms as well as QM/MM calculations with periodic boundary conditions.
Article
We present the GPU version of DeePMD-kit, which, upon training a deep neural network model using ab initio data, can drive extremely large-scale molecular dynamics (MD) simulation with ab initio accuracy. Our tests show that for a water system of 12,582,912 atoms, the GPU version can be 7 times faster than the CPU version under the same power consumption. The code can scale up to the entire Summit supercomputer. For a copper system of 113,246,208 atoms, the code can perform one nanosecond MD simulation per day, reaching a peak performance of 86 PFLOPS (43% of the peak). Such unprecedented ability to perform MD simulation with ab initio accuracy opens up the possibility of studying many important issues in materials and molecules, such as heterogeneous catalysis, electrochemical cells, irradiation damage, crack propagation, and biochemical reactions. Program summary Program Title: DeePMD-kit CPC Library link to program files: https://doi.org/10.17632/phyn4kgsfx.1 Developer’s repository link: https://doi.org/10.5281/zenodo.3961106 Licensing provisions: LGPL Programming language: C++/Python/CUDA Journal reference of previous version: Comput. Phys. Commun. 228 (2018), 178–184. Does the new version supersede the previous version?: Yes. Reasons for the new version: Parallelize and optimize the DeePMD-kit for modern high performance computers. Summary of revisions: The optimized DeePMD-kit is capable of computing 100 million atoms molecular dynamics with ab initio accuracy, achieving 86 PFLOPS in double precision. Nature of problem: Modeling the many-body atomic interactions by deep neural network models. Running molecular dynamics simulations with the models. Solution method: The Deep Potential for Molecular Dynamics (DeePMD) method is implemented based on the deep learning framework TensorFlow. Standard and customized TensorFlow operators are optimized for GPU. Massively parallel molecular dynamics simulations with DeePMD models on high performance computers are supported in the new version.
Article
Reactive molecular dynamics (MD) simulation is a powerful tool to study the reaction mechanism of complex chemical systems. Central to the method is the potential energy surface (PES) that can describe the breaking and formation of chemical bonds. The development of both accurate and efficient PES has attracted significant effort in the past 2 decades. A recently developed deep potential (DP) model has the promise to bring ab initio accuracy to large-scale reactive MD simulations. However, for complex chemical reaction processes like pyrolysis, it remains challenging to generate reliable DP models with an optimal training data set. In this work, a data set construction scheme for such a purpose was established. The employment of a concurrent learning algorithm allows us to maximize the exploration of the chemical space while minimizing the redundancy of the data set. This greatly reduces the cost of computational resources required for ab initio calculations. Based on this method, we constructed a data set for the pyrolysis of n-dodecane, which contains 35 496 structures. The reactive MD simulation with the DP model trained based on this data set revealed the pyrolysis mechanism of n-dodecane in detail, and the simulation results are in good agreement with the experimental measurements. In addition, this data set shows excellent transferability to different long-chain alkanes. These results demonstrate the advantages of the proposed method for constructing training data sets for similar systems.
Article
Calculations of free energy profile, aka potential of mean force (PMF), along a chosen collective variable (CV) are now routinely applied in the studies of chemical processes, such as enzymatic reactions and chemical reactions in condensed phases. However, if the ab initio QM/MM level of accuracy is required for the PMF, it can be formidably demanding even with the most advanced enhanced sampling methods, such as umbrella sampling. To ameliorate this difficulty, we developed a novel method for the computation of free energy profile based on the reference-potential method recently, in which a low-level reference Hamiltonian is employed for phase space sampling and the free energy profile can be corrected to the level of interest (the target Hamiltonian) by energy reweighting in a nonparametric way. However, when the reference Hamiltonian is very different from the target Hamiltonian, the calculated ensemble averages, including the PMF, often suffer from numerical instability, which mainly comes from the overestimation of the density-of-states (DoS) in the low-energy region. Stochastic samplings of these low-energy configurations are rare events, and some low-energy conformations may get oversampled in simulations of a finite length. In this work, an assumption of Gaussian distribution is applied to the DoS in each CV bin, and the weight of each configuration is rescaled according to the accumulated DoS. The results show that this smoothing process can remarkably reduce the ruggedness of the PMF and increase the reliability of the reference-potential method.
Article
In recent years, promising deep learning based interatomic potential energy surface (PES) models have been proposed that can potentially allow us to perform molecular dynamics simulations for large scale systems with quantum accuracy. However, making these models truly reliable and practically useful is still a very non-trivial task. A key component in this task is the generation of datasets used in model training. In this paper, we introduce the Deep Potential GENerator (DP-GEN), an open-source software platform that implements the recently proposed ”on-the-fly” learning procedure (Zhang et al. 2019) and is capable of generating uniformly accurate deep learning based PES models in a way that minimizes human intervention and the computational cost for data generation and model training. DP-GEN automatically and iteratively performs three steps: exploration, labeling, and training. It supports various popular packages for these three steps: LAMMPS for exploration, Quantum Espresso, VASP, CP2K, etc. for labeling, and DeePMD-kit for training. It also allows automatic job submission and result collection on different types of machines, such as high performance clusters and cloud machines, and is adaptive to different job management tools, including Slurm, PBS, and LSF. As a concrete example, we illustrate the details of the process for generating a general-purpose PES model for Cu using DP-GEN. Program summary Program Title: DP-GEN Program Files doi: http://dx.doi.org/10.17632/sxybkgc5xc.1 Licensing provisions: LGPL Programming language: Python Nature of problem: Generating reliable deep learning based potential energy models with minimal human intervention and computational cost. Solution method: The concurrent learning scheme is implemented. Supports for sampling configuration space with LAMMPS, generating ab initio data with Quantum Espresso, VASP, CP2K and training potential models with DeePMD-kit are provided. Supports for different machines including workstations, high performance clusters and cloud machines are provided. Supports for job management tools including Slurm, PBS, LSF are provided.
Article
The catalytic properties of RNA have been a subject of fascination and intense research since their discovery over 30 years ago. Very recently, several classes of nucleolytic ribozymes have emerged and been characterized structurally. Among these, the twister ribozyme has been center-stage, and a topic of debate about its architecture and mechanism owing to conflicting interpretations of different crystal structures, and in some cases conflicting interpretations of the same functional data. In the present work, we attempt to clean up the mechanistic "debris" generated by twister ribozymes using a comprehensive computational RNA enzymology approach aimed to provide a unified interpretation of existing structural and functional data. Simulations in the crystalline environment and in solution provide insight into the origins of observed differences in crystal structures, and coalesce on a common active site architecture, and dynamical ensemble in solution. We use GPU-accelerated free energy methods with enhanced sampling to ascertain microscopic nucleobase pK a values of the implicated general acid and base, from which predicted activity-pH profiles can be compared directly with experiments. Next, ab initio quantum mechanical/molecular mechanical (QM/MM) simulations with full dynamic solvation under periodic boundary conditions are used to determine mechanistic pathways through multi-dimensional free energy landscapes for the reaction. We then characterize the rate-controlling transition state, and make predictions about kinetic isotope effects and linear free energy relations. Computational mutagenesis is performed to explain the origin of rate effects caused by chemical modifications and make experimentally testable predictions. Finally, we provide evidence that helps to resolve conflicting issues related to the role of metal ions in catalysis. Throughout each stage, we highlight how a conserved L-platform structural motif, to- gether with a key L-anchor residue, forms the characteristic active site scaffold enabling each of the catalytic strategies to come together not only for the twister ribozyme, but the majority of the known small nucleolytic ribozyme classes.
Article
A predictive understanding of the mechanisms of RNA cleavage is important for the design of emerging technology built from biological and synthetic molecules that have promise for new biochemical and medicinal applications. Over the past 15 years, RNA cleavage reactions involving 2'-O-transphosphorylation have been discussed using a simplified framework introduced by Breaker that consists of four fundamental catalytic strategies (designated α, β, γ, and δ) that contribute to rate enhancement. As more detailed mechanistic data emerge, there is need for the framework to evolve and keep pace. We develop an ontology for discussion of strategies of enzymes that catalyze RNA cleavage via 2'-O-transphosphorylation that stratifies Breaker’s framework into primary (1°), secondary (2°) and tertiary (3°) contributions to enable more precise interpretation of mechanism in the context of structure and bonding. Further, we point out instances where atomic-level changes give rise to changes in more than one catalytic contribution, a phenomenon we refer to as ‘functional blurring’. We hope that this ontology will help clarify our conversations and pave the path forward toward a consensus view of these fundamental and fascinating mechanisms. The insight gained will deepen our understanding of RNA cleavage reactions catalyzed by natural protein and RNA enzymes, as well as aid in the design of new engineered DNA and synthetic enzymes.
Article
Free energy profile (FE Profile) is an essential quantity for the estimation of reaction rate and the validation of reaction mechanism. For chemical reactions in condensed phase or enzymatic reactions, the computation of FE profile at ab initio (ai) quantum mechanical/molecular mechanics (QM/MM) level is still far too expensive. Although semiempirical (SE) method can be hundreds or thousands of times faster than the ai methods, the accuracy of SE methods is often unsatisfactory, due to the approximations that have been adopted in these methods. In this work, we proposed a new method termed MBAR+wTP, in which the ai QM/MM free energy profile is computed by a weighted thermodynamic perturbation (TP) correction to the SE profile generated by the multistate Bennett acceptance ratio (MBAR) analysis of the trajectories from umbrella samplings (US). The weight factors used in the TP calculations are a byproduct of the MBAR analysis in the post-processing of the US trajectories, which are often discarded after the free energy calculations. The raw ai QM/MM free energy profile is then smoothed using Gaussian process regression, in which the noise of each datum is set to be inversely proportional to the exponential of the reweighting entropy. The results show that this approach can enhance the efficiency of ai FE profile calculations by several orders of magnitude with only a slight loss of accuracy. This method can significantly enhance the applicability of ai QM/MM methods in the studies of chemical reactions in condensed phase and enzymatic reactions.
Article
Transition state theory teaches that chemically stable mimics of enzymatic transition states will bind tightly to their cognate enzymes. Kinetic isotope effects combined with computational quantum chemistry provides enzymatic transition state information with sufficient fidelity to design transition state analogues. Examples are selected from various stages of drug development to demonstrate the application of transition state theory, inhibitor design, physicochemical characterization of transition state analogues, and their progress in drug development.
Article
Modern deep neural network training is typically based on mini-batch stochastic gradient optimization. While the use of large mini-batches increases the available computational parallelism, small batch training has been shown to provide improved generalization performance and allows a significantly smaller memory footprint, which might also be exploited to improve machine throughput. In this paper, we review common assumptions on learning rate scaling and training duration, as a basis for an experimental comparison of test performance for different mini-batch sizes. We adopt a learning rate that corresponds to a constant average weight update per gradient calculation (i.e., per unit cost of computation), and point out that this results in a variance of the weight updates that increases linearly with the mini-batch size m. The collected experimental results for the CIFAR-10, CIFAR-100 and ImageNet datasets show that increasing the mini-batch size progressively reduces the range of learning rates that provide stable convergence and acceptable test performance. On the other hand, small mini-batch sizes provide more up-to-date gradient calculations, which yields more stable and reliable training. The best performance has been consistently obtained for mini-batch sizes between m=2m = 2 and m=32m = 32, which contrasts with recent work advocating the use of mini-batch sizes in the thousands.
Article
There has been a resurgence of interest in free energy methods motivated by the performance enhancements offered by molecular dynamics (MD) software written for specialized hardware, such as graphics processing units (GPUs). In this work, we exploit the properties of a parameter-interpolated thermodynamic integration (PI-TI) method to connect states by their molecular mechanical (MM) parameter values. This pathway is shown to be better behaved for Mg2+^{2+}\rightarrowCa2+^{2+} transformations than traditional linear alchemical pathways (with and without soft-core potentials). The PI-TI method has the practical advantage that no modification of the MD code is required to propagate the dynamics. In the case of AMBER, this enables all the performance benefits of GPU-acceleration to be realized, in addition to unlocking the full spectrum of features available within the MD software, such as Hamiltonian replica exchange (HREM). The TI evaluation can be accomplished efficiently in a post-processing step by reanalyzing the statistically independent trajectory frames in parallel for high throughput. We apply the PI-TI method with HREM on GPUs in AMBER to predict pKaK_a values in double stranded RNA molecules, and make comparison with experiments. Convergence to under 0.25 units for these systems required 100 ns or more of sampling per window, and coupling of windows with HREM. We find that MM charges derived from {\em ab initio} QM/MM fragment calculations improve the agreement between calculation and experiment.
Article
This article reviews the fundamentals of variational transition state theory (VTST), its recent theoretical development, and some modern applications. The theoretical methods reviewed here include multidimensional quantum mechanical tunneling, multistructural VTST (MS-VTST), multi-path VTST (MP-VTST), both reaction-path VTST (RP-VTST) and variable reaction coordinate VTST (VRC-VTST), system-specific quantum Rice–Ramsperger–Kassel theory (SS-QRRK) for predicting pressure-dependent rate constants, and VTST in the solid phase, liquid phase, and enzymes. We also provide some perspectives regarding the general applicability of VTST.
Article
Emergent RNA technologies employ sequence and structural information to perform a diversity of biological functions. Synthetic RNA molecules have been developed for a wide array of applications, including genetic regulation, environmental sensing, and diagnostics devices. Recent advances in chemical synthesis and computational design of RNA have enhanced our ability to program novel functions and expand upon current biomedical applications for therapeutics and diagnostics. In this review, we highlight recent advances in synthetic RNA devices that have been engineered for biomedical systems, while addressing the current limitations and challenges of translating these engineered functional RNAs to clinical applications.
Article
The partitioning of solute molecules between immiscible solvents with significantly different polarities is of great importance. The polarization between the solute and solvent molecules plays an essential role in determining the solubility of the solute, which makes computational studies utilizing molecular mechanics (MM) rather difficult. In contrast, quantum mechanics (QM) can provide more reliable predictions. In this work, the partition coefficients of the side chain analogs of some amino acids between water and chloroform were computed. The QM solvation free energies were calculated indirectly via a series of MM states using Multistate Bennett Acceptance Ratio (MBAR) and the MM-to-QM corrections were applied at the two end points using Thermodynamic Perturbation (TP). Previously, it has been shown (Journal of Chemical Theory and Computation, 2016, 12, 499) that this method provides the minimal variance in the results without running QM simulations. However, if there is insufficient overlap in phase space between the MM and QM Hamiltonians, this method fails. In this work, we propose, for the first time, a quantity termed the reweighting entropy that serves as a metric for the reliability of the TP calculations. If the reweighting entropy is below a certain threshold (0.65 for the solvation free energy calculations in this work), this MM-to-QM correction should be avoided and two alternative methods can be employed by either introducing a semi-empirical state or conducting nonequilibrium simulations. However, the results show that the QM methods are not guaranteed to yield better results than the MM methods. Further improvement of the QM methods are imperative, especially the treatment of the van der Waals and the electrostatic interactions between the QM region and the MM region in the first shell. We also propose a scheme for the calculation of the van der Waals parameters for the solute molecules in nonaqueous solvent, which improves the quality of the computed thermodynamic properties. Furthermore, the force field parameters for the sulfur-containing molecules are also optimized.
Chapter
Path-integral free energy perturbation (PI-FEP) theory is presented to directly determine the ratio of quantum mechanical partition functions of different isotopologs in a single simulation. Furthermore, a double averaging strategy is used to carry out the practical simulation, separating the quantum mechanical path integral exactly into two separate calculations, one corresponding to a classical molecular dynamics simulation of the centroid coordinates, and another involving free-particle path-integral sampling over the classical, centroid positions. An integrated centroid path-integral free energy perturbation and umbrella sampling (PI-FEP/UM, or simply, PI-FEP) method along with bisection sampling was summarized, which provides an accurate and fast convergent method for computing kinetic isotope effects for chemical reactions in solution and in enzymes. The PI-FEP method is illustrated by a number of applications, to highlight the computational precision and accuracy, the rule of geometrical mean in kinetic isotope effects, enhanced nuclear quantum effects in enzyme catalysis, and protein dynamics on temperature dependence of kinetic isotope effects.
Chapter
Advances in computational and experimental methods in enzymology have aided comprehension of enzyme-catalyzed chemical reactions. The main difficulty in comparing computational findings to rate measurements is that the first examines a single energy barrier, while the second frequently reflects a combination of many microscopic barriers. We present here intrinsic kinetic isotope effects and their temperature dependence as a useful experimental probe of a single chemical step in a complex kinetic cascade. Computational predictions are tested by this method for two model enzymes: dihydrofolate reductase and thymidylate synthase. The description highlights the significance of collaboration between experimentalists and theoreticians to develop a better understanding of enzyme-catalyzed chemical conversions.
Article
A new approach for performing Particle Mesh Ewald in ab initio QM/MM simulations with extended atomic orbital basis sets is presented. The new approach, the Ambient-Potential Composite Ewald Method (CEw), does not perform the QM/MM interaction with Mulliken charges nor electrostatically fit charges. Instead the nuclei and electron density interact directly with the MM environment, but in a manner that avoids the use of dense Fourier transform grids. By performing the electrostatics with the underlying QM density, the CEw method avoids self-consistent field instabilities that have been encountered with simple charge mapping procedures. Potential of mean force (PMF) profiles of the p-nitrophenyl phosphate dissociation reaction in explicit solvent are computed from PBE0/6-31G* QM/MM molecular dynamics simulations with various electrostatic protocols. The CEw profiles are shown to be stable with respect to real-space Ewald cutoff, whereas the PMFs computed from truncated and switched electrostatics produce artifacts. PBE0/6-311G**, AM1/d-PhoT, and DFTB2 QM/MM simulations are performed to generate two-dimensional PMF profiles of the phosphoryl transesterification reactions with ethoxide and phenoxide leaving groups. The semiempirical models incorrectly produce a concerted ethoxide mechanism, whereas PBE0 correctly produces a stepwise mechanism. The {\em ab initio} reaction barriers agree more closely to experiment than the semiempirical models. The failure of Mulliken-charge QM/MM-Ewald is analyzed.
Article
In this work we apply multipath canonical variational transition state theory with small-tunneling corrections (MP-CVT/SCT) to the hydrogen abstraction reaction from ethanol by atomic hydrogen in aqueous solution at room temperature. This reaction presents two transition states which can interconvert by internal rotations about single bonds and another two transition states that are non interconvertible enantiomers to the former structures. The study also includes another three reactions with isotopically substituted species for which there are experimental values of thermal rate constants and kinetic isotope effects (KIEs). The agreement between the MP-CVT/SCT thermal rate constants and the experimental data is good. The KIEs obtained by the MP-CVT/SCT methodology are factorized in terms of individual transition state contributions to facilitate the analysis. It was found that the percentage contribution of each transition state to the total KIE is independent of the isotopic substitution.
Article
The approximate density-functional tight-binding theory method DFTB3 has been implemented in the quantum mechanics/molecular mechanics (QM/MM) framework of the Gromacs molecular simulation package. We show that the efficient smooth particle-mesh Ewald implementation of Gromacs extends to the calculation of QM/MM electrostatic interactions. Further, we make use of the various free-energy functionalities provided by Gromacs and the PLUMED plugin. We exploit the versatility and performance of the current framework in three typical applications of QM/MM methods to solve biophysical problems: (i) ultrafast proton transfer in malonaldehyde, (ii) conformation of the alanine dipeptide, and (iii) electron-induced repair of a DNA lesion. Also discussed is the further development of the framework, regarding mostly the options for parallelization. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Article
The well-studied mechanism of ribonuclease A is believed to involve concerted general acid-base catalysis by two histidine residues, His12 and His119. The basic features of this mechanism are often cited to explain rate enhancement by both protein and RNA enzymes that catalyze RNA 2'-O-transphosphorylation. Recent kinetic isotope effect analyses and computational studies are providing a more chemically detailed description of the mechanism of RNase A and the rate limiting transition state. Overall, the results support an asynchronous mechanism for both solution and ribonuclease catalyzed reactions in which breakdown of a transient dianoinic phosphorane intermediate by 5'O-P bond cleavage is rate limiting. Relative to non-enzymatic reactions catalyzed by specific base, a smaller KIE on the 5'O leaving group and a less negative βLG are observed for RNase A catalysis. Quantum mechanical calculations consistent with these data support a model in which electrostatic and H-bonding interactions with the non-bridging oxygens and proton transfer from His119 render departure of the 5'O less advanced and stabilize charge buildup in the transition state. Both experiment and computation indicate advanced 2'O-P bond formation in the rate limiting transition state. However, this feature makes it difficult to resolve the chemical steps involved in 2'O activation. Thus, modeling the transition state for RNase A catalysis underscores those elements of its chemical mechanism that are well resolved, as well as highlighting those where ambiguity remains. This article is part of a Special Issue entitled: Enzyme Transition States from Theory and Experiment. Copyright © 2015. Published by Elsevier B.V.
Article
Experimental analysis of kinetic isotope effects represents an extremely powerful approach for gaining information about the transition state structure of complex reactions not available through other methodologies. The implementation of this approach to the study of nucleic acid chemistry requires the synthesis of nucleobases and nucleotides enriched for heavy isotopes at specific positions. In this review, we highlight current approaches to the synthesis of nucleic acids enriched site specifically for heavy oxygen and nitrogen and their application in heavy atom isotope effect studies. This article is part of a special issue titled: Enzyme Transition States from Theory and Experiment. Copyright © 2015. Published by Elsevier B.V.
Article
Divalent metal ions, due to their ability to stabilize high concentrations of negative charge, are important for RNA folding and catalysis. Detailed models derived from the structures and kinetics of enzymes and from computational simulations have been developed. However, in most cases the specific catalytic modes involving metal ions and their mechanistic roles and effects on transition state structures remain controversial. Valuable information about the nature of the transition state is provided by measurement of kinetic isotope effects (KIEs). However, KIEs reflect changes in all bond vibrational modes that differ between the ground state and transition state. QM calculations are therefore essential for developing structural models of the transition state and evaluating mechanistic alternatives. Herein, we present computational models for Zn(2+) binding to RNA 2'O-transphosphorylation reaction models that aid in the interpretation of KIE experiments. Different Zn(2+) binding modes produce distinct KIE signatures, and one binding mode involving two zinc ions is in close agreement with KIEs measured for non-enzymatic catalysis by Zn(2+) aquo ions alone. Interestingly, the KIE signatures in this specific model are also very close to those in RNase A catalysis. These results allow a quantitative connection to be made between experimental KIE measurements and transition state structure and bonding, and provide insight into RNA 2'O-ransphosphorylation reactions catalyzed by metal ions and enzymes. This article is part of a Special Issue entitled: Enzyme Transition States from Theory and Experiment. Copyright © 2015. Published by Elsevier B.V.
Article
We review the current status of transition-state theory. We focus on the validity of its basic assumptions and of corrections to and improvements of conventional transition-state theory. The review is divided into sections concerned in turn with bimolecular reactions in the gas phase, unimolecular reactions in the gas phase, and isomerizations and atom-transfer reactions in liquid-phase solutions. Some aspects that are emphasized are variational transition-state theory, tunneling, the assumption of an equilibrium distribution of reactants, and the frictional effects of solvent molecules.
Article
Computing in science and engineering is now ubiquitous: digital technologies underpin, accelerate, and enable new, even transformational, research in all domains. Access to an array of integrated and well-supported high-end digital services is critical for the advancement of knowledge. Driven by community needs, the Extreme Science and Engineering Discovery Environment (XSEDE) project substantially enhances the productivity of a growing community of scholars, researchers, and engineers (collectively referred to as "scientists"' throughout this article) through access to advanced digital services that support open research. XSEDE's integrated, comprehensive suite of advanced digital services federates with other high-end facilities and with campus-based resources, serving as the foundation for a national e-science infrastructure ecosystem. XSEDE's e-science infrastructure has tremendous potential for enabling new advancements in research and education. XSEDE's vision is a world of digitally enabled scholars, researchers, and engineers participating in multidisciplinary collaborations to tackle society's grand challenges.
Article
Phosphoryl transfer reactions are ubiquitous in biology and the understanding of the mechanisms whereby these reactions are catalyzed by protein and RNA enzymes is central to reveal design principles for new therapeutics. Two of the most powerful experimental probes of chemical mechanism involve the analysis of linear free energy relations (LFERs) and the measurement of kinetic isotope effects (KIEs). These experimental data report directly on differences in bonding between the ground state and the rate-controlling transition state, which is the most critical point along the reaction free energy pathway. However, interpretation of LFER and KIE data in terms of transition-state structure and bonding optimally requires the use of theoretical models. In this work, we apply density-functional calculations to determine KIEs for a series of phosphoryl transfer reactions of direct relevance to the 2'-O-transphosphorylation that leads to cleavage of the phosphodiester backbone of RNA. We first examine a well-studied series of phosphate and phosphorothioate mono-, di- and triesters that are useful as mechanistic probes and for which KIEs have been measured. Close agreement is demonstrated between the calculated and measured KIEs, establishing the reliability of our quantum model calculations. Next, we examine a series of RNA transesterification model reactions with a wide range of leaving groups in order to provide a direct connection between observed Brønsted coefficients and KIEs with the structure and bonding in the transition state. These relations can be used for prediction or to aid in the interpretation of experimental data for similar non-enzymatic and enzymatic reactions. Finally, we apply these relations to RNA phosphoryl transfer catalyzed by ribonuclease A, and demonstrate the reaction coordinate-KIE correlation is reasonably preserved. A prediction of the secondary deuterium KIE in this reaction is also provided. These results demonstrate the utility of building up knowledge of mechanism through the systematic study of model systems to provide insight into more complex biological systems such as phosphoryl transfer enzymes and ribozymes.
Article
Although there have been great strides in defining the mechanisms of RNA strand cleavage by 2'-O-transphosphorylation, long-standing questions remain. How do different catalytic modes such as acid/base and metal ion catalysis influence transition state charge distribution? Does the large rate enhancement characteristic of biological catalysis result in different transition states relative to solution reactions? Answering these questions is important for understanding biological catalysis in general, and revealing principles for designing small molecule inhibitors. Recent application of linear free energy relationships and kinetic isotope effects together with multi-scale computational simulations are providing tentative answers to these questions for this fundamentally important class of phosphoryl transfer reactions.
Article
RNA-assisted catalysis by ribozymes plays select but critical roles in nature. Despite seemingly restricted diversity in chemical properties, the sequence and structural diversity available to RNA supports selective positioning of reactants and metal-ion and other cofactors, along with providing general acid and base properties and fine-tuning of environmental pKa values. Early concepts of RNA as a scaffold for Mg2+-ion catalysts have evolved to a continuum of metal-ion participation in RNA catalysis, with active sites showing a range of interactions that span an absence of direct metal-ion interactions to precisely positioned clusters of three or more specific cations. In the context of the vast counterion atmosphere supported by nucleic acids and the potential nonspecific reactivity of metal ions in backbone hydrolysis, it is amazing that RNAs have evolved to selectively harness metal ions for function and that naturally occurring sequences of both RNA and DNA have evaded unwanted reactivities.
Article
The variational free energy profile (vFEP) method is extended to two dimensions and tested with molecular simulation applications. The proposed 2D-vFEP approach effectively addresses the two major obstacles to constructing free energy profiles from simulation data using traditional methods: the need for overlap in the re-weighting procedure and the problem of data representation. This is especially evident as these problems are shown to be more severe in two dimensions. The vFEP method is demonstrated to be highly robust and able to provide stable, analytic free energy profiles with only a paucity of sampled data. The analytic profiles can be analyzed with conventional search methods to easily identify stationary points (e.g. minima and first-order saddle points) as well as the pathways that connect these points. These "roadmaps" through the free energy surface are useful not only as a post-processing tool to characterize mechanisms, but can also serve as a basis from which to direct more focused "on-the-fly" sampling or adaptive force biasing. Test cases demonstrate that 2D-vFEP outperforms other methods in terms of the amount and sparsity of the data needed to construct stable, converged analytic free energy profiles. In a classic test case, the two dimensional free energy profile of the backbone torsion angles of alanine dipeptide, 2D-vFEP needs less than 1% of the original data set to reach a sampling accuracy of 0.5 kcal/mol in free energy shifts between windows. A new software tool for performing one and two dimensional vFEP calculations is herein described and made publicly available.
Article
Recent developments in path integral methodology have significantly reduced the computational expense of including quantum mechanical effects in the nuclear motion in ab initio molecular dynamics simulations. However, the implementation of these developments requires a considerable programming effort, which has hindered their adoption. Here we describe i-PI, an interface written in Python that has been designed to minimise the effort required to bring state-of-the-art path integral techniques to an electronic structure program. While it is best suited to first principles calculations and path integral molecular dynamics, i-PI can also be used to perform classical molecular dynamics simulations, and can just as easily be interfaced with an empirical forcefield code. To give just one example of the many potential applications of the interface, we use it in conjunction with the CP2K electronic structure package to showcase the importance of nuclear quantum effects in high-pressure water.
Article
Enhancing sampling and analyzing simulations are central issues in molecular simulation. Recently, we introduced PLUMED, an open-source plug-in that provides some of the most popular molecular dynamics (MD) codes with implementations of a variety of different enhanced sampling algorithms and collective variables (CVs). The rapid changes in this field, in particular new directions in enhanced sampling and dimensionality reduction together with new hardwares, require a code that is more flexible and more efficient. We therefore present PLUMED 2 here - a complete rewrite of the code in an object-oriented programming language (C++). This new version introduces greater flexibility and greater modularity, which both extends its core capabilities and makes it far easier to add new methods and CVs. It also has a simpler interface with the MD engines and provides a single software library containing both tools and core facilities. Ultimately, the new code better serves the ever-growing community of users and contributors in coping with the new challenges arising in the field.
Article
We present a new formulation of variational transition state theory (VTST) called multi-structural VTST (MS-VTST) and the use of this to calculate the rate constant for the 1,4-hydrogen shift isomerization reaction of 1-pentyl radical and that for the reverse reaction. MS-VTST uses a multi-faceted dividing surface and provides a convenient way to include the contributions of many structures (typically conformers) of the reactant and the transition state in rate constant calculations. In this particular application, we also account for the torsional anharmonicity. We used the multi-configuration Shepard interpolation method to efficiently generate a semi-global portion of the potential energy surface from a small number of high-level electronic structure calculations using the M06 density functional in order to compute the energies and Hessians of Shepard points along a reaction path. The M06-2X density functional was used to calculate the multi-structural anharmonicity effect, including all of the structures of the reactant, product and transition state. To predict the thermal rate constant, VTST calculations were performed to obtain the canonical variational rate constant over the temperature range 200–2000 K. A transmission coefficient is calculated by the multidimensional small-curvature tunneling (SCT) approximation. The final MS-CVT/SCT thermal rate constant was determined by combining a reaction rate calculation in the single-structural harmonic oscillator approximation (including tunneling) with the multi-structural anharmonicity torsional factor. The calculated forward rate constant agrees very well with experimentally-based evaluations of the high-pressure limit for the temperature range 300–1300 K, although it is a factor of 2.5–3.0 lower than the single-structural harmonic oscillator approximation over this temperature range. We anticipate that MS-VTST will be generally useful for calculating the reaction rates of complex molecules with multiple torsions.