grid points. From this, we calculate the intersection and union of
the actual ligand atoms and the predictions. We compare methods
using the over-prediction factor (Prediction Volume/Ligand
Volume), precision (Intersection Volume/Prediction Volume),
recall (Intersection Volume/Ligand Volume), and Jaccard coeffi-
cient (Intersection Volume/Union Volume).
We also create precision-recall (PR) curves, which compare
precision (TP/(TP + FP)) on the y-axis with recall (TP/(TP + FN))
on the x-axis, to evaluate the ability of each method to predict
whether a ligand atom is present at a grid point. We consider grid
points that overlap a ligand atom as positives. To construct the PR
curve, we calculate the precision and recall at each cutoff of the
grid values in the pocket prediction grid. To summarize the
performance of each method, we construct a composite PR curve
 by averaging the precision at each recall level for each
structure in the dataset. As a reference point, we include the
performance of a random classifier averaged over all the structures
as well. The expected performance of a random method is the
number of positives over the number of all grid points. The
method and code of Davis and Goadrich  is used to calculate
the area under the PR curve (PR-AUC). The significance of the
difference between methods is assessed using the Wilcoxon signed-
rank test over paired performance statistics for all structures in the
dataset. The significance of the difference in performance of a
single method on different datasets is calculated with the Wilcoxon
For the residue-based evaluation, we consider how well each
method’s residue scores identify ligand binding residues. Positives
are those residues in contact with a ligand as defined by LigASite
database. PR curves were made by calculating, for each chain, the
precision and recall at each position on the ranked list of residue
scores. Composite PR curves were computed as described for the
grid point evaluation, but curves were first averaged over the
chains in a structure and then over structures. PR curves were
constructed similarly for the catalytic site analysis, but positives
were defined as those residues listed in the Catalytic Site Atlas.
Text S1 Supplementary text, results, and analysis.
Found at: doi:10.1371/journal.pcbi.1000585.s001 (0.39 MB PDF)
Conceived and designed the experiments: JAC MS TAF. Performed the
experiments: JAC TAF. Analyzed the data: JAC MS TAF. Wrote the
paper: JAC MS TAF. Provided methodological input: RAL JMT.
1. Huang B, Schroeder M (2006) LIGSITEcsc: predicting ligand binding sites using
the Connolly surface and degree of conservation. BMC Struct Bio 6: 19.
2. Capra J, Singh M (2007) Predicting functionally important residues from
sequence conservation. Bioinf 23: 1875–1882.
3. Lopez G, Valencia A, Tress M (2007) firestar---prediction of functionally
important residues using structural templates and alignment reliability. Nucleic
Acids Res 35: W573–W577.
4. Kuznetsov I, Gou Z, Li R, Hwang S (2006) Using evolutionary and structural
information to predict DNA-binding sites on DNA-binding proteins. Proteins:
Stuct, Func, and Bioinf 64: 19–27.
5. Youn E, Peters B, Radivojac P, Mooney S (2007) Evaluation of features for
catalytic residue prediction in novel folds. Prot Sci 16: 216–226.
6. Ofran Y, Rost B (2007) Protein-protein interaction hotspots carved into
sequences. PLoS Comput Biol 3: e119.
7. Zhou H, Qin S (2007) Interaction-site prediction for protein complexes: a critical
assessment. Bioinf 23: 2203–2209.
8. Hannenhalli S, Russell R (2000) Analysis and prediction of functional sub-types
from protein sequence alignments. J Mol Biol 303: 61–76.
9. del Sol Mesa A, Pazos F, Valencia A (2003) Automatic methods for predicting
functionally important residues. J Mol Biol 326: 1289–1302.
10. Kalinina O, Mironov A, Gelfand M, Rakhmaninova A (2004) Automated
selection of positions determining functional specificity of proteins by
comparative analysis of orthologous groups in protein families. Prot Sci 13:
11. Chakrabarti S, Bryant S, Panchenko A (2007) Functional specificity lies within
the properties and evolutionary changes of amino acids. J Mol Biol 373:
12. Capra J, Singh M (2008) Characterization and prediction of residues
determining protein functional specificity. Bioinf 24: 1473–1480.
13. Levitt D, Banaszak L (1992) Pocket: A computer graphics method for identifying
and displaying protein cavities and their surrounding amino acids. J Mol
Graphics 10: 229–234.
14. Laskowski R (1995) Surfnet: a program for visualizing molecular surfaces,
cavities, and intermolecular interactions. J Mol Graph 12: 323–330.
15. Peters K, Fauck J, Fro¨mmel C (1996) The automatic search for ligand binding
sites in proteins of known three-dimensional structure using only geometric
criteria. J Mol Biol 256: 201–213.
16. Hendlich M, Rippman F, Barnickel G (1997) LIGSITE: automatic and efficient
detection of potential small molecule-binding sites in proteins. J Mol Graph
Model 15: 359–363.
17. Liang J, Edelsbrunner H, Woodward C (1998) Anatomy of protein pockets and
cavities: Measurement of binding site geometry and implications for ligand
design. Prot Sci 7: 1884–1897.
18. Brady Jr G, Stouten P (2000) Fast prediction and visualization of protein binding
pockets with PASS. J Comp-Aided Mol Design 14: 383–401.
19. Dundas J, Ouyang Z, Tseng J, Binkowski A, Turpaz Y, et al. (2006) CASTp:
computed atlas of surface topography of proteins with structural and
topographical mapping of functionally annotated residues. Nucleic Acids Res
20. Xie L, Bourne P (2007) A robust and efficient algorithm for the shape
description of protein structures and its application in predicting ligand binding
sites. BMC Bioinf 8: S9.
21. Weisel M, Proschak E, Schneider G (2007) PocketPicker: analysis of ligand
binding-sites with shape descriptors. Chem Cen J 1: 7.
22. Valdar W (2002) Scoring residue conservation. Proteins: Structure, Function,
and Genetics 48: 227–241.
23. An J, Totrov M, Abagyan R (2005) Pocketome via comprehensive identification
and classification of ligand binding envelopes. Mol Cell Prot 4: 752–761.
24. Dessailly B, Lensink M, Orengo C, Wodak S (2008) LigASite: a database of
biologically relevant binding sites in proteins with known apo-structures. Nucleic
Acids Res 36: D667–673.
25. Laurie A, Jackson R (2005) Q-SiteFinder: an energy-based method for the
prediction of protein–ligand binding sites. Bioinf 21: 1908–1916.
26. Mayrose I, Graur D, Ben-Tal N, Pupko T (2004) Comparison of site-specific
rate-inference methods: Bayesian methods are superior. Mol Biol Evol 21:
27. Wang K, Samudrala R (2006) Incorporating background frequency improves
entropy-based residue conservation measures. BMC Bioinf 7: 385.
28. Mihalek I, Res I, Lichtarge O (2004) A family of evolution–entropy hybrid
methods for ranking protein residues by importance. J Mol Biol 336: 1265–1282.
29. Sankararaman S, Sjolander K (2008) Intrepid–information-theoretic tree
traversal for protein functional site identificantion. Bioinf 24: 2445–2452.
30. Bahadur KD, Livesay D (2008) Improving position specific predictions of
protein functional sites using phylogenetic motifs. Bioinf 24: 2308–2316.
31. Fischer J, Mayer C, Soeding J (2008) Prediction of protein functional residues
from sequence by probability density estimation. Bioinf 24: 613–620.
32. Elcock A (2001) Prediction of functionally important residues based solely on the
computed energetics of protein structure. J Mol Biol 312: 885–896.
33. Bate P, Warwicker J (2004) Enzyme/non-enzyme discrimination and prediction
of enzyme active site location using charge-based methods. J Mol Biol 340:
34. Hernandez M, Ghersi D, Sanchez R (2009) SITEHOUND-web: a server for
ligand binding site identification in protein structures. Nucleic Acids
35. Ko J, Murga L, Andre P, Yang H, Ondrechen M, et al. (2005) Statistical criteria
for the identification of protein active sites using theoretical microscopic titration
curves. Proteins: Stuct, Func, and Bioinf 59: 193–195.
36. Brylinski M, Skolnick J (2008) A threading-based method (FINDSITE) for
ligand-binding site prediction and functional annotation. Proc Natl Acad Sci
37. Halperin I, Wolfson H, Nussinov R (2003) SiteLight: binding-site prediction
using phage display libraries. Prot Sci 12: 1344–1359.
38. Amitai G, Shemesh A, Sitbon E, Shklar M, Netanely D, et al. (2004) Network
analysis of protein structures identifies functional residues. J Mol Biol 344:
39. Landau M, Mayrose I, Rosenberg Y, Glaser Y, Martz E, et al. (2005) ConSurf
2005: the projection of evolutionary conservation scores of residues on protein
structures. Nucleic Acids Res 33: W299–W302.
Ligand Binding Site Prediction with ConCavity
PLoS Computational Biology | www.ploscompbiol.org 17 December 2009 | Volume 5 | Issue 12 | e1000585