Are Protein Force Fields Getting Better? A Systematic Benchmark on 524 Diverse NMR Measurements

Biophysics Program, Stanford University, Stanford, CA.
Journal of Chemical Theory and Computation (Impact Factor: 5.31). 04/2012; 8(4):1409-1414. DOI: 10.1021/ct2007814
Source: PubMed

ABSTRACT Recent hardware and software advances have enabled simulation studies of protein systems on biophysically-relevant timescales, often revealing the need for improved force fields. Although early force field development was limited by the lack of direct comparisons between simulation and experiment, recent work from several labs has demonstrated direct calculation of NMR observables from protein simulations. Here we quantitatively evaluate recent molecular dynamics force fields against a suite of 524 chemical shift and J coupling ((3)JH(N)H(α), (3)JH(N)C(β), (3)JH(α)C', (3)JH(N)C', and (3)JH(α)N) measurements on dipeptides, tripeptides, tetra-alanine, and ubiquitin. Of the force fields examined (ff96, ff99, ff03, ff03*, ff03w, ff99sb*, ff99sb-ildn, ff99sb-ildn-phi, ff99sb-ildn-nmr, CHARMM27, OPLS-AA), two force fields (ff99sb-ildn-phi, ff99sb-ildn-nmr) combining recent side chain and backbone torsion modifications achieve high accuracy in our benchmark. For the two optimal force fields, the calculation error is comparable to the uncertainty in the experimental comparison. This observation suggests that extracting additional force field improvements from NMR data may require increased accuracy in J coupling and chemical shift prediction. To further investigate the limitations of current force fields, we also consider conformational populations of dipeptides, which were recently estimated using vibrational spectroscopy.

Download full-text


Available from: Kyle Beauchamp, Jul 05, 2015
  • [Show abstract] [Hide abstract]
    ABSTRACT: The sophistication of the force fields, algorithms and hardware used for molecular dynamics (MD) simulations of proteins is continuously increasing. No matter how advanced the methodology, however, it is essential to evaluate the appropriateness of the structures sampled in a simulation by comparison with quantitative experimental data. Solution nuclear magnetic resonance (NMR) data are particularly useful for checking the quality of protein simulations, as they provide both structural and dynamic information on a variety of temporal and spatial scales. Here, various features and implications of using NMR data to validate and bias MD simulations are outlined, including an overview of the different types of NMR data that report directly on structural properties and of relevant simulation techniques. The focus throughout is on how to properly account for conformational averaging, particularly within the context of the assumptions inherent in the relationships that link NMR data to structural properties.
    Biophysical Reviews 09/2012; 4(3). DOI:10.1007/s12551-012-0087-6
  • [Show abstract] [Hide abstract]
    ABSTRACT: Accurate computational prediction of protein structure represents a longstanding challenge in molecular biology and structure-based drug design. Although homology modeling techniques are widely used to produce low-resolution models, refining these models to high resolution has proven difficult. With long enough simulations and sufficiently accurate force fields, molecular dynamics (MD) simulations should in principle allow such refinement, but efforts to refine homology models using MD have for the most part yielded disappointing results. It has thus far been unclear whether MD-based refinement is limited primarily by accessible simulation timescales, force field accuracy, or both. Here, we examine MD as a technique for homology model refinement using all-atom simulations, each at least 100 μs long-more than 100 times longer than previous refinement simulations-and a physics-based force field that was recently shown to successfully fold a structurally diverse set of fast-folding proteins. In MD simulations of 24 proteins chosen from the refinement category of recent Critical Assessment of Structure Prediction (CASP) experiments, we find that in most cases, simulations initiated from homology models drift away from the native structure. Comparison with simulations initiated from the native structure suggests that force field accuracy is the primary factor limiting MD-based refinement. This problem can be mitigated to some extent by restricting sampling to the neighborhood of the initial model, leading to structural improvement that, while limited, is roughly comparable to the leading alternative methods.
    Proteins Structure Function and Bioinformatics 01/2012; 80(8):2071-9. DOI:10.1002/prot.24098
  • [Show abstract] [Hide abstract]
    ABSTRACT: Accurate knowledge of the 3D structural ensemble of proteins is important for understanding of their biological function. We report here the application of microsecond all-atom molecular dynamics (MD) simulations in explicit solvent for the improvement of the quality of low-resolution structures obtained by protein structure prediction (decoys). Seventy MD simulations of 1 μs average duration were performed on 13 different protein systems starting from X-ray crystal structures and decoys. Their behavior can be divided into three groups: 22 trajectories converged toward the native state, 27 trajectories displayed a quasi-equilibrium by populating mainly a single non-native free energy basin, and 21 trajectories drifted away from their initial decoy structure transiently visiting multiple free energy minima. To determine whether the native structure can be identified among non-native ensembles, the free energy was determined for each basin by the MM/GBSA method together with the von Mises entropy estimator in dihedral angle space. For the proteins studied here, it is found that the ensembles belonging to free energy basins with the lowest free energies and the longest residence times are most native-like. The results demonstrate that explicit solvent microsecond MD simulations using the latest generation of protein force fields and free energy metrics are sufficiently accurate to permit positive identification of native state ensembles against low-resolution structural models and decoys. The approach can be applied to the direct refinement of predicted or experimental low-resolution protein structures.
    Journal of Chemical Theory and Computation 06/2012; 8(7):2531–2539. DOI:10.1021/ct300358u