Predicting Protein Ligand Binding Sites by Combining Evolutionary Sequence Conservation and 3D Structure

Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America.
PLoS Computational Biology (Impact Factor: 4.62). 12/2009; 5(12):e1000585. DOI: 10.1371/journal.pcbi.1000585
Source: PubMed


Identifying a protein's functional sites is an important step towards characterizing its molecular function. Numerous structure- and sequence-based methods have been developed for this problem. Here we introduce ConCavity, a small molecule binding site prediction algorithm that integrates evolutionary sequence conservation estimates with structure-based methods for identifying protein surface cavities. In large-scale testing on a diverse set of single- and multi-chain protein structures, we show that ConCavity substantially outperforms existing methods for identifying both 3D ligand binding pockets and individual ligand binding residues. As part of our testing, we perform one of the first direct comparisons of conservation-based and structure-based methods. We find that the two approaches provide largely complementary information, which can be combined to improve upon either approach alone. We also demonstrate that ConCavity has state-of-the-art performance in predicting catalytic sites and drug binding pockets. Overall, the algorithms and analysis presented here significantly improve our ability to identify ligand binding sites and further advance our understanding of the relationship between evolutionary sequence conservation and structural and functional attributes of proteins. Data, source code, and prediction visualizations are available on the ConCavity web site (

Download full-text


Available from: Roman Aleksander Laskowski, Mar 11, 2014
    • "Functionally important residues were predicted for proteins of known function as well as SG proteins using the POOL method [20] [21], with electrostatic and chemical properties from THE- MATICS [31] [32], phylogenetic tree information from INTREPID 2 R. Parasuram et al. / Methods xxx (2015) xxx–xxx Please cite this article in press as: R. Parasuram et al., Methods (2015), [33] [34], and geometric features from ConCavity [35] as input (Fig. 1, left). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Thousands of protein structures of unknown or uncertain function have been reported as a result of high-throughput structure determination techniques developed by Structural Genomics (SG) projects. However, many of the putative functional assignments of these SG proteins in the Protein Data Bank (PDB) are incorrect. While high-throughput biochemical screening techniques have provided valuable functional information for limited sets of SG proteins, the biochemical functions for most SG proteins are still unknown or uncertain. Therefore, computational methods for the reliable prediction of protein function from structure can add tremendous value to the existing SG data. In this article, we show how computational methods may be used to predict the function of SG proteins, using examples from the six-hairpin glycosidase (6-HG) and the concanavalin A-like lectins/glucanases (CAL/G) superfamilies. Using a set of predicted functional residues, obtained from computed electrostatic and chemical properties for each protein structure, it is shown that these superfamilies may be sorted into functional families according to biochemical function. Within these superfamilies, a total of 18 SG proteins were analyzed according to their predicted, local functional sites: 13 from the 6-HG superfamily, five from the CAL/G superfamily. Within the 6-HG superfamily, an uncharacterized protein bacova_03626 from Bacteroides ovatus (PDB 3ON6) and a hypothetical protein BT3781 from Bacteroides thetaiotaomicron (PDB 2P0V) are shown to have very strong active site matches with exo-α-1,6-mannosidases, thus likely possessing this function. Also in this superfamily, it is shown that protein BH0842, a putative glycoside hydrolase from Bacteroides halodurans (PDB 2RDY), has a predicted active site that matches well with a known α-L-galactosidase. In the CAL/G superfamily, an uncharacterized glycosyl hydrolase family 16 protein from Mycobacterium smegmatis (PDB 3RQ0) is shown to have local structural similarity at the predicted active site with the known members of the GH16 family, with the closest match to the endoglucanase subfamily. The method discussed herein can predict whether an SG protein is correctly or incorrectly annotated and can sometimes provide a reliable functional annotation. Examples of application of the method across folds, comparing active sites between two proteins of different structural folds, are also given.
    No preview · Article · Nov 2015 · Methods
  • Source
    • "All of them have a dual potential: they can be used to inspect and extract biochemical information from structural data or to validate structural results [3]. For example, the analysis of cavities in the protein interior or at the protein surface may help in identifying ligand binding sites and may also indicate regions of anomalous and perhaps incorrect packing of the residues [4] [5]. Some of the protein structure analysis tools are based on chemical principles. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Wolumes is a fast and stand-alone computer program written in standard C that allows the measure of atom volumes in proteins. Its algorithm is a simple discretization of the space by means of a grid of points at 0.75 Angstroms from each other and it uses a set of van der Waals radii optimized for protein atoms. By comparing the computed values with distributions derived from a non-redundant subset of the Protein Data Bank, the new methods allows to identify atoms and residues abnormally large/small. The source code is freely available, together with some examples.
    Preview · Article · Jun 2014
  • Source
    • "Global and local conformational similarities between proteins indicate functional similarities and are useful for inferring functions of novel proteins. Several methods have been developed to identify surface pockets and cavities in protein structures as well, that help in identification of potential active/binding sites and amino acid residues therein (Capra et al., 2009;Najmanovich et al., 2008;Gold & Jackson, 2006;Chang et al., 2006;Wass et al., 2010). This approach is especially useful for prediction of enzymatic functions. "

    Full-text · Chapter · Mar 2014
Show more