A graph-theory algorithm for rapid protein side-chain prediction.

Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania 19111, USA.
Protein Science (Impact Factor: 2.86). 10/2003; 12(9):2001-14. DOI: 10.1110/ps.03154503
Source: PubMed

ABSTRACT Fast and accurate side-chain conformation prediction is important for homology modeling, ab initio protein structure prediction, and protein design applications. Many methods have been presented, although only a few computer programs are publicly available. The SCWRL program is one such method and is widely used because of its speed, accuracy, and ease of use. A new algorithm for SCWRL is presented that uses results from graph theory to solve the combinatorial problem encountered in the side-chain prediction problem. In this method, side chains are represented as vertices in an undirected graph. Any two residues that have rotamers with nonzero interaction energies are considered to have an edge in the graph. The resulting graph can be partitioned into connected subgraphs with no edges between them. These subgraphs can in turn be broken into biconnected components, which are graphs that cannot be disconnected by removal of a single vertex. The combinatorial problem is reduced to finding the minimum energy of these small biconnected components and combining the results to identify the global minimum energy conformation. This algorithm is able to complete predictions on a set of 180 proteins with 34342 side chains in <7 min of computer time. The total chi(1) and chi(1 + 2) dihedral angle accuracies are 82.6% and 73.7% using a simple energy function based on the backbone-dependent rotamer library and a linear repulsive steric energy. The new algorithm will allow for use of SCWRL in more demanding applications such as sequence design and ab initio structure prediction, as well addition of a more complex energy function and conformational flexibility, leading to increased accuracy.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Proteins are long-chained biomolecules with distinctive functions, that take a major role in all living systems. The function is defined by the protein structure, which in turn is determined via a complicated mechanism based on the amino acid sequence. The exact procedure is not fully understood. However, knowing the structure is important for the pharmaceutical industry as well as bioengineering and nanotechnology. Unfortunately, determining it experimentally is slow and expensive. There is also much interest in being able to adapt the sequence to make stable industrial enzymes or to form molecules with specialised shapes, e.g. for biosensors. Predicting a structure computationally from the sequence is a classic problem in theoretical biochemistry, that has not been solved yet. In this work the emphasis lies in methodological improvements, that avoid common chemical preconceptions. A general method for building numerical model s is developed and analysed here. It is based on a statistical correlation scheme of sequence and structure using ideas from self-consistent mean field (SCMF) optimisation. The procedure is successfully applied to the structure prediction and sequence design problems without using a Boltzmann formalism. The statistical model is based on a mixture distribution of bivariate Gaussian and 20-way Bernoulli distributions. The Gaussian distributions model the continuous variables of the structure (dihedral angles) and the Bernoulli distributions capture the sequence propensities. Instead of treating the protein as a statistical unit, easier to handle fragments are used. Several approaches to recombine them are discussed. But the fragments form local statistical units that do not necessarily agree with each other. A method suited to deal with such inconsistencies is SCMF optimisation. Mean field or SCMF methods optimise a system by treating all solution states at the same time. In existing approaches, an energy potential was introduced that reflects the pairwise mean interaction between subsystems. The state weights of the subsystems were converted alternately into energies and probabilities by applying the Boltzmann relation repeatedly until a self-consistent state for the whole system is reached. With the approach presented here it is possible to optimise the state probabilities directly. The Boltzmann distribution is essentially an unnecessary assumption. Therefore, the method is also applicable to systems with an unknown ensemble.
    05/2012, Degree: Doctor rerum naturalium, Supervisor: Andrew E. Torda
  • Source
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: AMPLE clusters and truncates ab initio protein structure predictions, producing search models for molecular replacement. Here, an interesting degree of complementarity is shown between targets solved using the different ab initio modelling programs QUARK and ROSETTA. Search models derived from either program collectively solve almost all of the all-helical targets in the test set. Initial solutions produced by Phaser after only 5 min perform surprisingly well, improving the prospects for in situ structure solution by AMPLE during synchrotron visits. Taken together, the results show the potential for AMPLE to run more quickly and successfully solve more targets than previously suspected.
    Acta Crystallographica Section D Biological Crystallography 02/2015; 71(Pt 2):338-43. DOI:10.1107/S1399004714025784 · 7.23 Impact Factor

Full-text (2 Sources)

Available from
Jun 3, 2014