Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure

Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America.
PLoS Computational Biology (Impact Factor: 4.62). 12/2009; 5(12):e1000585. DOI: 10.1371/journal.pcbi.1000585
Source: PubMed


Identifying a protein's functional sites is an important step towards characterizing its molecular function. Numerous structure- and sequence-based methods have been developed for this problem. Here we introduce ConCavity, a small molecule binding site prediction algorithm that integrates evolutionary sequence conservation estimates with structure-based methods for identifying protein surface cavities. In large-scale testing on a diverse set of single- and multi-chain protein structures, we show that ConCavity substantially outperforms existing methods for identifying both 3D ligand binding pockets and individual ligand binding residues. As part of our testing, we perform one of the first direct comparisons of conservation-based and structure-based methods. We find that the two approaches provide largely complementary information, which can be combined to improve upon either approach alone. We also demonstrate that ConCavity has state-of-the-art performance in predicting catalytic sites and drug binding pockets. Overall, the algorithms and analysis presented here significantly improve our ability to identify ligand binding sites and further advance our understanding of the relationship between evolutionary sequence conservation and structural and functional attributes of proteins. Data, source code, and prediction visualizations are available on the ConCavity web site (

Download full-text


Available from: Roman Aleksander Laskowski, Mar 11, 2014
28 Reads
  • Source
    • "All of them have a dual potential: they can be used to inspect and extract biochemical information from structural data or to validate structural results [3]. For example, the analysis of cavities in the protein interior or at the protein surface may help in identifying ligand binding sites and may also indicate regions of anomalous and perhaps incorrect packing of the residues [4] [5]. Some of the protein structure analysis tools are based on chemical principles. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Wolumes is a fast and stand-alone computer program written in standard C that allows the measure of atom volumes in proteins. Its algorithm is a simple discretization of the space by means of a grid of points at 0.75 Angstroms from each other and it uses a set of van der Waals radii optimized for protein atoms. By comparing the computed values with distributions derived from a non-redundant subset of the Protein Data Bank, the new methods allows to identify atoms and residues abnormally large/small. The source code is freely available, together with some examples.
    • "Several methods have been developed to identify surface pockets and cavities in protein structures as well, that help in identification of potential active/binding sites and amino acid residues therein (Capra et al., 2009; Najmanovich et al., 2008; Gold & Jackson, 2006; Chang et al., 2006; Wass et al., 2010). This approach is especially useful for prediction of enzymatic functions. "

    Genomics III Methods, Techniques and Applications, 1st Edition edited by iConcept Press Ltd, 03/2014; iConcept Press Ltd.., ISBN: 978-1-922227-416
  • Source
    • "So far, several computational methods have been proposed for identifying protein functional sites [1-17]. These methods can be categorized into three groups: 1) approaches that focus on molecular docking with known protein structures [1-5]; 2) methods that predict putative interacting sites based on protein sequences [6-17]; 3) methods that identify interacting sites based on the hybrid features of protein structure and sequences [15]. Due to the structures of most proteins are not available, the structure-based methods cannot be generally used. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Identifying ligand-binding sites is a key step to annotate the protein functions and to find applications in drug design. Now, many sequence-based methods adopted various predicted results from other classifiers, such as predicted secondary structure, predicted solvent accessibility and predicted disorder probabilities, to combine with position-specific scoring matrix (PSSM) as input for binding sites prediction. These predicted features not only easily result in high-dimensional feature space, but also greatly increased the complexity of algorithms. Moreover, the performances of these predictors are also largely influenced by the other classifiers. In order to verify that conservation is the most powerful attribute in identifying ligand-binding sites, and to show the importance of revising PSSM to match the detailed conservation pattern of functional site in prediction, we have analyzed the Adenosine-5'-triphosphate (ATP) ligand as an example, and proposed a simple method for ATP-binding sites prediction, named as CLCLpred (Contextual Local evolutionary Conservation-based method for Ligand-binding prediction). Our method employed no predicted results from other classifiers as input; all used features were extracted from PSSM only. We tested our method on 2 separate data sets. Experimental results showed that, comparing with other 9 existing methods on the same data sets, our method achieved the best performance. This study demonstrates that: 1) exploiting the signal from the detailed conservation pattern of residues will largely facilitate the prediction of protein functional sites; and 2) the local evolutionary conservation enables accurate prediction of ATP-binding sites directly from protein sequence.
    Algorithms for Molecular Biology 03/2014; 9(1):7. DOI:10.1186/1748-7188-9-7 · 1.46 Impact Factor
Show more