PIER: Protein Interface Recognition for
Irina Kufareva,1Levon Budagyan,2Eugene Raush,2Maxim Totrov,2and Ruben Abagyan1,2*
1Scripps Research Institute, La Jolla, California 92037
2Molsoft LLC, La Jolla, California 92037
proteomics call for development of fast and reli-
able automatic methods for prediction of func-
tional surfaces of proteins with known three-
dimensional structure, including binding sites for
known and unknown protein partners as well as
oligomerization interfaces. Despite significant pro-
gress the problem is still far from being solved.
Most existing methods rely, at least partially, on ev-
olutionary information from multiple sequence
alignments projected on protein surface. The com-
mon drawback of such methods is their limited
applicability to the proteins with a sparse set of se-
quential homologs, as well as inability to detect
interfaces in evolutionary variable regions. In this
study, the authors developed an improved method
for predicting interfaces from a single protein
structure, which is based on local statistical prop-
erties of the protein surface derived at the level of
atomic groups. The proposed Protein IntErface
Recognition (PIER) method achieved the overall
precision of 60% at the recall threshold of 50% at
the residue level on a diverse benchmark of 490
homodimeric, 62 heterodimeric, and 196 transient
interfaces (compared with 25% precision at 50% re-
call expected from random residue function assign-
ment). For 70% of proteins in the benchmark, the
binding patch residues were successfully detected
with precision exceeding 50% at 50% recall. The
calculation only took seconds for an average 300-
residue protein. The authors demonstrated that
adding the evolutionary conservation signal only
marginally influenced the overall prediction per-
formance on the benchmark; moreover, for certain
classes of proteins, using this signal actually re-
sulted in a deteriorated prediction. Thorough bench-
marking using other datasets from literature showed
that PIER yielded improved performance as com-
pared with several alignment-free or alignment-
dependent predictions. The accuracy, efficiency,
and dependence on structure alone make PIER
a suitable tool for automated high-throughput
annotation of protein structures emerging from
structural proteomics projects. Proteins 2007;67:
Recent advancesin structural
C 2007 Wiley-Liss, Inc.
Key words: protein–protein interaction; structural
proteomics; cell signaling and protein
recognition; structure–function annota-
tion; alignment-independent interface
As crystallographers continue producing novel protein
structures with fully or partially unknown function, the
question arises of what aspects of their biological function
can be predicted from those structures. Predicting the
propensity of a protein to form complexes with other pro-
teins, the location of the interfaces, and possible oligo-
meric states1,2is of particular importance because of the
role of protein interactions and associations in molecular
biology.3,4While modern docking algorithms are getting
better at predicting protein association geometries (see
Refs. 5–9 for reviews), they can only be used when identi-
ties and three-dimensional structures of all partners are
known; and even for those cases, the prediction is further
complicated by the induced fit, incompleteness or inad-
equate quality of available structures, and computer re-
quirements. Most often, however, we either do not know
what the second protein is, or do not have its structure.
Reliable prediction of protein binding interfaces from a
single protein with a known 3D structure, therefore, be-
comes a key computational problem.
Existing methods for protein interface prediction can be
divided into two classes: (i) methods incorporating evolu-
tionary information in the form of certain conservation
measures derived from multiple sequence alignments
Abbreviations: ASA, solvent accessible surface area; CAPRI, criti-
cal assessment of prediction of interactions; MSA, multiple sequence
alignment; NEIT, nonenzyme-inhibitor transient interaction; ODA,
optimal docking area; PDB, protein data bank; PIER, protein inter-
face recognition; PLS, partial least squares regression; RMSD, root
mean square deviation; SVM, support vector machines.
Grant sponsor: NIH; Grant number: 5-R01-GM071872-02.
The PIER predictor is available on the web: http://abagyan.scrip-
ps.edu/?kufareva/pier.cgi. The dataset of 748 protein interfaces with
the accompanying information can also be downloaded from this web
*Correspondence to: Ruben Abagyan, 10550 North Torrey Pines
Rd., Mail TPC-28, La Jolla, CA 92037. E-mail: firstname.lastname@example.org
Received 9 December 2005; Revised 2 May 2006; Accepted 16
Published online 13 February 2007 in Wiley InterScience (www.
interscience.wiley.com). DOI: 10.1002/prot.21233
C 2007 WILEY-LISS, INC.
PROTEINS: Structure, Function, and Bioinformatics 67:400–417 (2007)
(MSA) and projected on protein surface, and (ii) those based
solely on geometrical, physicochemical, and statistical prop-
erties of the surface. Several successful methods of the first
class were developed and published recently. Some of them
are based on the evolutionary information alone, for exam-
ple, the Evolutionary Trace method by Lichtarge et al.10,11
that was further complemented by residue cluster analy-
sis,12invariant polar residues mapping,13maximum parsi-
mony approach (MP-ConSurf),14,15maximum likelihood
calculation (Rate4Site),16robust incorporation of alignment
reliability (REVCOM),17and so forth. Other evolutionary
methods combine alignment-derived information with the
properties of the protein surface, either by obtaining a com-
bined heuristic score (e.g., ProMate by Neuvirth et al.18), or
by using machine learning approaches such as support vec-
tor machines (SVM)19–22and neural networks.23–25
The reliability of evolutionary conservation scores de-
rived from MSA as determinants of protein interface has
been questioned by many authors. In spite of the broad
evidence that the interfaces generally mutate at slower
rates than the rest of protein surface,10,26–28it was
argued that conservation score alone is not sufficient for
accurate discrimination; moreover, it can be misleading
in several ways.29–32First, the high variability of align-
ment composition and extent, unbalanced subfamily
representations, and local alignment errors (e.g., shifts)
are the issues that need to be taken into account.33
Alignments can be easily contaminated with paralogs
that do not share the same interfaces.29,34Furthermore,
even orthologous proteins sometimes vary in their qua-
ternary structure and binding interfaces. Second, the
prediction greatly depends on the algorithm of deriving
scores from the alignment.11,14,16,17,35–37Third, even the
most sophisticated algorithms break down on the pro-
teins with no or few orthologs. Fourth, many protein
interfaces are not expected to be better conserved at all,
either because of their function (e.g., the adaptable
binding surfaces of the immune system proteins) or
because they were formed late in evolution; such inter-
faces are undetectable with alignment-dependent meth-
ods.30Finally, the small ligand binding sites are usually
better conserved than large protein interfaces,31,36and
thus, using conservation scores actually leads to in-
creased rate of false positives in automatic protein in-
terface prediction. All these factors, therefore, limit
applicability and reliability of the alignment-dependent
Alignment-independent prediction methods rely on
an assumption that protein interfaces are different
from the rest of the surface by their physicochemical
and geometrical properties. While it was demonstrated
that the composition of protein interface patches had
statistically significant biases,38–47
using the differences for patch discrimination have en-
countered several difficulties. The physical properties of
the interfaces are highly diverse44,48,49between various
protein families and complex types. Moreover, even
within a single interface the binding energy is not dis-
tributed evenly among residues; instead, there are so-
the attempts of
called ‘‘hot spots,’’ which contribute most of the interac-
tion energy, while the other interface residues are of
relatively minor importance.50–52Finally, the extent
and shape of a protein patch in which the small local
biases accumulate into a statistically significant signal
is not known in advance. In spite of these complica-
tions, Jones and Thornton53demonstrated that align-
ment-independent interface identification was possible
for 39 out of 59 complexes (66%). Later, it was shown
that desolvation is indicative of protein interfaces.54–56
The optimal docking area (ODA) method56based on
modified desolvation energies was reported to predict
interfaces for 42 out of 53 (80%) heterodimeric transient
complexes. The interaction patterns can also be cap-
tured by machine learning methods applied to large
protein interaction benchmarks:
et al.57developed a neural network based predictor,
which detected 44% of protein interfaces in the set of
7821 structures, which constituted 76% of all PDB
structures available at the time of publication.
The purpose of this study is to improve the reliability
and accuracy of alignment-independent protein interface
prediction, and to evaluate to what extent incorporating
evolutionary information may help the interface identifi-
cation. We developed a method of protein interface pre-
diction from local statistical properties of the protein sur-
face at the atomic-group level that can (but does not have
to) be further complemented by evolutionary conserva-
tion scores. We used a cross-validated partial least
squares (PLS) regression algorithm58and evaluated the
significance of each protein surface feature in the result-
ing prediction model. The contribution of the evolution-
ary signal was as little as 7–10%, with the rest 90–93%
being contributed by atomic group composition descrip-
tors, and adding this signal only marginally influenced
the prediction performance. Moreover, for certain classes
of proteins, using conservation scores actually resulted in
The proposed alignment-independent method demon-
strated improved performance over the previously pub-
lished methods. On a diverse benchmark of 748 proteins
known to be involved in homodimeric and heterodimeric
interactions, permanent as well as transient, the overall
precision at the residue level was 60% at the recall
threshold of 50%. The method was also tested on other
benchmarks. Using the method, we identified potential
MATERIALS AND METHODS
A diverse set of dimeric interfaces was taken from a
recent publication,19and carefully checked for biological
correctness. In several cases we found that the true oligo-
meric state was different from the assumed dimeric state;
these pairs were eliminated from the dataset. Each of the
remaining complexes was assumed to be either a perma-
nent dimer or a transient complex. Permanent dimers
PIER: PROTEIN INTERFACE RECOGNITION
PROTEINS: Structure, Function, and Bioinformatics
were classified into homodimers (sequence identity ?90%)
and heterodimers (sequence identity <90%). Within the
set of transient complexes, enzyme-inhibitor interactions
were separated from nonenzyme-inhibitor transient (NEIT)
This produced a dataset of 490 monomers with homodi-
meric permanent interfaces, 62 monomers with heterodi-
meric permanent interfaces, and 76 proteins involved in
transient interactions. In the latter group, 12 proteins
were classified as enzymes with inhibitor-binding interfa-
ces, 12 were inhibitors with enzyme-binding interfaces,
and the remaining 52 were molecules involved in other
transient interactions. To avoid bias related to underre-
presentation of enzyme-inhibitor interactions in the data-
set, we had to collect a separate set of enzymes and their
protein inhibitors present in PDB.59All short chains (less
than 70 residues for enzymes and less than 25 residues
for inhibitors) were discarded. From the rest of the set
combined with the original 24 enzyme-inhibitor interac-
tions, we iteratively removed all sequences sharing more
than 50% identity with other sequences. This procedure
produced the set of 85 enzymes with protein inhibitor
interfaces and 59 protein inhibitors with enzyme-binding
interfaces. Among the enzymes, there were 21 serine (EC
3.4.21), 13 cysteine (EC 3.4.22), 13 aspartic (EC 3.4.23),
and 4 metallo- (EC 3.4.24) endopeptidases, 1 aminopepti-
dase (EC 3.4.11), and 1 metallocarboxypeptidase (EC
3.4.17), 53 proteases in total. The obtained set of enzymes
and protein inhibitors was added to the dataset, resulting
in the total of 748 interfaces. The dataset with accompany-
ing information is available on the web: http://abagyan.
For each monomer in the dataset, solvent accessible
surface areas (ASAs) were calculated for all heavy atoms
by a modified version of the Shrake and Rupley60algo-
rithm implemented in ICM,61in three states: monomer
alone (unbound state), monomer with all small ligands
present in the PDB structure, and monomer with its pro-
tein partner (bound state).
A residue was called an internal residue, if the com-
bined ASA of all its heavy atoms did not exceed 3 A˚2. If
this was the case, or if more than 3 A˚2of residue surface
participated in an interaction with small ligands, the res-
idue was omitted from the calculations. Each of the other
residues was classified as an interface residue, if its ASA
differed by more than 20 A˚2between bound and disjoint
states; otherwise it was called a noninterface residue.
All heavy atoms in the 20 naturally occurring amino
acids were classified into 32 types according to their
chemical element, formal charge, sp-, sp2-, or sp3-hybrid-
ization, and the number and the type of covalently bound
heavy atoms. Given a protein molecule with known inter-
face, let N be the total number of atoms on its solvent ac-
cessible surface, with Atof them being atoms of type t,
1 ? t ? 32. Also, let n denote the number of interface
atoms, with atbeing the corresponding number of atoms
of type t.
Assuming the null hypothesis to be a uniform distribu-
tion of the atoms over the protein surface, the probability
of finding exactly a atoms of type t on the interface would
N ? At
n ? a
(the so-called hypergeometric distribution). Accordingly,
the probability of finding at least at atoms would be
Pða ? atÞ ¼Pn
lated P(a ? at) was less than 0.05, the atoms of type t
were said to be significantly underrepresented on the
interface, and in case of P(a ? at) < 0.05 they were signif-
Some atomic types are too rare to be qualified as signif-
icant upon analysis of single proteins; however, their
presence correlates well with the location of interfaces for
larger data clusters. For these types, we substituted sin-
gle proteins by arbitrary sets of 15–20 proteins, and
determined the described above probabilities for com-
bined values of N, At, n, and atwithin every set.
As expected, the occurrence statistics strongly corre-
lated between some atom types because of the structural
constraints; for example, the representational bias of a-
carbon and peptide bond nitrogen was the same, since
they are always covalently bonded in the protein struc-
ture. Such covalently linked atom types were merged into
groups, thus avoiding singularities and improving the
robustness of the model with respect to rearrangements
of small atomic details on protein surface. The obtained
groups are listed in Table I.
a¼atPðaÞ and the probability of finding at
most atatoms would be Pða ? atÞ ¼Pat
a¼0PðaÞ. If calcu-
A method for surface patch generation was adopted
from Ref. 56. First, the solvent accessible surface of the
protein was expanded by 3 A˚, and a set of evenly distrib-
uted surface points was calculated by dividing the surface
into triangles with an average side of 5 A˚. Next, each
point was assigned a surface patch, consisting of all sol-
vent accessible heavy atoms of the protein, located within
a certain distance of the point (see Fig. 1). Because of the
preliminary surface expansion, the points were located at
an average distance of 3 A˚from the surface. This pre-
served the overall protein surface shape while ignoring
the small atomic details; it also allowed avoiding, as
much as possible, patches ‘‘leaking through’’ the protein
interior and including accessible atoms located on the
For the purpose of interface prediction, the optimal
patch size was found to be between 900 and 1000 A˚2.
However, to ensure that the patch size was adequate for
each individual protein, the distance for patch generation
I. KUFAREVA ET AL.
PROTEINS: Structure, Function, and Bioinformatics
The PIER values also indicated that the dimer struc-
ture for pyruvate, phosphate dikinase from Clostridium
symbiosum, deposited in PDB (PDB entries 1KBL and
1KC7), might be incorrect. For comparison, we show the
published and the predicted dimerization geometry for
this protein in Figure 9(b). This prediction is supported
by Herzberg et al.,89who mention that ‘‘the dimer inter-
face is formed exclusively by the PEP-pyruvate binding
These examples show that PIER provides a crystal-in-
dependent prediction of biological protein assemblies and
allows to correct errors and resolve ambiguities resulting
from a naive interpretation of the crystallographic inter-
PIER Time Requirements
Our implementation of the PIER method is relatively
fast. On a Pentium III 700 MHz, the interface calculation
only took 10–12 seconds for an average 300-residue pro-
tein. We also have estimated the performance on several
hundred proteins and derived the effect of protein size.
Time dependence of the algorithm on the protein size is
polynomial, T(n) ? n7/4, where n is the number of resi-
dues in the protein. High calculation speed makes the
method applicable for automatic screening and annota-
tion of large sets of protein structures.
We presented the PIER method that is able to predict
protein interfaces from a single protein structure, and
does not require any evolutionary signal derived from
multiple-sequence alignments. The method is based on
empirically derived parameters for protein surface atom
groups that reflect common properties of protein interfa-
ces. Exceptions from the PIER rules helped to identify a
class of protein interfaces that rely on specialized mecha-
nisms of complex formation. We demonstrated that incor-
porating the evolutionary conservation only marginally
influenced the predictor performance. Fast and reliable,
the PIER method may be a useful tool in automatic anno-
tation of known and newly discovered proteins, including
identification of novel protein interfaces, prediction of oli-
gomerization states, and explaining the effects of single
nucleotide polymorphisms and pathological mutations.
The authors thank Andrew Bordner from Molsoft LLC,
Julian Mintseris from Boston University, and James
Bradford from University of Leeds, UK, for graciously
sharing their datasets of protein interfaces. The authors
also thank Joel Janin for valuable discussions. All molec-
ular images were generated with the ICM software.68
1. Szilagyi A, Grimm V, Arakaki AK, Skolnick J. Prediction of phys-
ical protein–protein interactions. Phys Biol 2005;2:S1–S16.
2. Valencia A, Pazos F. Computational methods for the prediction of
protein interactions. Curr Opin Struct Biol 2002;12:368–373.
3. Pagel P, Wong P, Frishman D. A domain interaction map based
on phylogenetic profiling. J Mol Biol 2004;344:1331–1346.
4. Pagel P, Kovac S, Oesterheld M, Brauner B, Dunger-Kaltenbach
I, Frishman G, Montrone C, Mark P, Stu ¨mpflen V, Mewes HW,
Ruepp A, Frishman D. The MIPS mammalian protein–protein
interaction database. Bioinformatics 2005;21:832–834.
5. Halperin I, Ma B, Wolfson H, Nussinov R. Principles of docking:
an overview of search algorithms and a guide to scoring func-
tions. Proteins 2002;47:409–443.
6. Me ´ndez R, Leplae R, Maria LD, Wodak SJ. Assessment of blind
predictions of protein–protein interactions: current status of dock-
ing methods. Proteins 2003;52:51–67.
7. Smith GR, Sternberg MJ. Prediction of protein–protein interac-
tions by docking methods. Curr Opin Struct Biol 2002;12:28–35.
8. Vajda S, Camacho CJ. Protein–protein docking: is the glass half-
full or half-empty? Trends Biotechnol 2004;22:110–116.
9. Wodak SJ, Me ´ndez R. Prediction of protein–protein interactions:
the CAPRI experiment, its evaluation and implications. Curr
Opin Struct Biol 2004;14:242–249.
10. Lichtarge O, Bourne HR, Cohen FE. An evolutionary trace me-
thod defines binding surfaces common to protein families. J Mol
11. Lichtarge O, Sowa ME. Evolutionary predictions of binding sur-
faces and interactions. Curr Opin Struct Biol 2002;12:21–27.
12. Landgraf R, Xenarios I, Eisenberg D. Three-dimensional cluster
analysis identifies interfaces and functional residue clusters in
proteins. J Mol Biol 2001;307:1487–1502.
13. Aloy P, Querol E, Aviles FX, Sternberg MJ. Automated structure-
based prediction of functional sites in proteins: applications to
assessing the validity of inheriting protein function from homol-
ogy in genome annotation and to protein docking. J Mol Biol
14. Armon A, Graur D, Ben-Tal N. ConSurf: an algorithmic tool for
the identification of functional regions in proteins by surface
mapping of phylogenetic information. J Mol Biol 2001;307:447–
15. Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, Martz E,
Ben-Tal N. ConSurf: identification of functional regions in pro-
teins by surface-mapping of phylogenetic information. Bioinfor-
16. Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N. Rate4Site: an
algorithmic tool for the identification of functional regions in pro-
teins by surface mapping of evolutionary determinants within
their homologues Bioinformatics 2002;18 (Suppl 1):71–77.
17. Bordner AJ, Abagyan R. REVCOM: a robust Bayesian method
for evolutionary rate estimation. Bioinformatics 2005;21:2315–
18. Neuvirth H, Raz R, Schreiber G. ProMate: a structure based pre-
diction program to identify the location of protein–protein bind-
ing sites. J Mol Biol 2004;338:181–199.
19. Bordner AJ, Abagyan R. Statistical analysis and prediction of
protein–protein interfaces. Proteins 2005;60:353–366.
20. Bradford JR, Westhead DR. Improved prediction of protein–pro-
tein binding sites using a support vector machines approach. Bio-
21. Koike A, Takagi T. Prediction of protein–protein interaction sites
using support vector machines. Protein Eng Des Sel 2004;17:
22. Sen TZ, Kloczkowski A, Jernigan RL, Yan C, Honavar V, Ho KM,
Wang CZ, Ihm Y, Cao H, Gu X, Dobbs D. Predicting binding sites
of hydrolase-inhibitor complexes by combining several methods.
BMC Bioinformatics 2004;5:205–205.
23. Fariselli P, Pazos F, Valencia A, Casadio R. Prediction of protein–
protein interaction sites in heterocomplexes with neural net-
works. Eur J Biochem 2002;269:1356–1361.
24. Valdar WS, Thornton JM. Conservation helps to identify biologi-
cally relevant crystal contacts. J Mol Biol 2001;313:399–416.
25. Zhou HX, Shan Y. Prediction of protein interaction sites from
sequence profile and residue neighbor list. Proteins 2001;44:336–
26. Halperin I, Wolfson H, Nussinov R. Protein–protein interactions;
coupling of structurally conserved residues and of hot spots
across interfaces. Implications for docking. Structure (Camb)
PIER: PROTEIN INTERFACE RECOGNITION
PROTEINS: Structure, Function, and Bioinformatics
27. Ma B, Elkayam T, Wolfson H, Nussinov R. Protein–protein inter-
actions: structurally conserved residues distinguish between
binding sites and exposed protein surfaces. Proc Natl Acad Sci
28. Valencia A, Pazos F. Prediction of protein–protein interactions
from evolutionary information. Methods Biochem Anal 2003;44:
29. Aloy P, Russell RB. Interrogating protein interaction networks
through structural biology. Proc Natl Acad Sci USA 2002;99:
30. Bradford JR, Westhead DR. Asymmetric mutation rates at
enzyme–inhibitor interfaces: implications for the protein–protein
docking problem. Protein Sci 2003;12:2099–2103.
31. Caffrey DR, Somaroo S, Hughes JD, Mintseris J, Huang ES. Are
protein–protein interfaces more conserved in sequence than the
rest of the protein surface? Protein Sci 2004;13:190–202.
32. Mintseris J, Weng Z. Structure, function, and evolution of tran-
sient and obligate protein–protein interactions. Proc Natl Acad
Sci USA 2005;102:10930–10935.
33. Rost B, Valencia A. Pitfalls of protein sequence analysis. Curr
Opin Biotechnol 1996;7:457–461.
34. Todd AE, Orengo CA, Thornton JM. Evolution of function in pro-
tein superfamilies, from a structural perspective. J Mol Biol
35. Casari G, Sander C, Valencia A. A method to predict functional
residues in proteins. Nat Struct Biol 1995;2:171–178.
36. Mesa AdS, Pazos F, Valencia A. Automatic methods for predicting
functionally important residues. J Mol Biol 2003;326:1289–1302.
37. Valdar WS. Scoring residue conservation. Proteins 2002;48:227–
38. Bahadur RP, Chakrabarti P, Rodier F, Janin J. A dissection of
specific and non-specific protein–protein interfaces. J Mol Biol
39. Jones S, Thornton JM. Analysis of protein–protein interaction
sites using surface patches. J Mol Biol 1997;272:121–132.
40. Larsen TA, Olson AJ, Goodsell DS. Morphology of protein–pro-
tein interfaces. Structure 1998;6:421–427.
41. Conte LL, Chothia C, Janin J. The atomic structure of protein–
protein recognition sites. J Mol Biol 1999;285:2177–2198.
42. MacCallum RM, Martin AC, Thornton JM. Antibody–antigen
interactions: contact analysis and binding site topography. J Mol
43. Nooren IM, Thornton JM. Structural characterisation and func-
tional significance of transient protein–protein interactions. J
Mol Biol 2003;325:991–1018.
44. Ofran Y, Rost B. Analysing six types of protein–protein interfa-
ces. J Mol Biol 2003;325:377–387.
45. Rodier F, Bahadur RP, Chakrabarti P, Janin J. Hydration of pro-
tein–protein interfaces. Proteins 2005;60:36–45.
46. Tsai CJ, Lin SL, Wolfson HJ, Nussinov R. Studies of protein–pro-
tein interfaces: a statistical analysis of the hydrophobic effect.
Protein Sci 1997;6:53–64.
47. Young L, Jernigan RL, Covell DG. A role for surface hydrophobic-
ity in protein–protein recognition. Protein Sci 1994;3:717–729.
48. Jones S, Thornton JM. Principles of protein–protein interactions.
Proc Natl Acad Sci USA 1996;93:13–20.
49. Nooren IM, Thornton JM. Diversity of protein–protein interac-
tions. EMBO J 2003;22:3486–3492.
50. Bogan AA, Thorn KS. Anatomy of hot spots in protein interfaces.
J Mol Biol 1998;280:1–9.
51. Clackson T, Wells JA. A hot spot of binding energy in a hor-
mone–receptor interface. Science 1995;267:383–386.
52. Rajamani D, Thiel S, Vajda S, Camacho CJ. Anchor residues in
protein–protein interactions. Proc Natl Acad Sci USA 2004;101:
53. Jones S, Thornton JM. Prediction of protein–protein interaction
sites using patch analysis. J Mol Biol 1997;272:133–143.
54. Camacho CJ, Kimura SR, DeLisi C, Vajda S. Kinetics of desolva-
tion-mediated protein–protein binding. Biophys J 2000;78:1094–
55. Kortvelyesi T, Dennis S, Silberstein M, Brown L, Vajda S. Algo-
rithms for computational solvent mapping of proteins. Proteins
56. Fernandez-Recio J, Totrov M, Skorodumov C, Abagyan R. Opti-
mal docking area: a new method for predicting protein–protein
interaction sites. Proteins 2005;58:134–143.
57. Keil M, Exner TE, Brickmann J. Pattern recognition strategies
for molecular surfaces. III. Binding site prediction with a neural
network. J Comput Chem 2004;25:779–789.
58. Geladi P, Kowalski B. Partial least squares regression: a tutorial.
Anal Chim Acta 1986;185:1–17.
59. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig
H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic
Acids Res 2000;28:235–242.
60. Shrake A, Rupley JA. Environment and exposure to solvent of
protein atoms: lysozyme and insulin. J Mol Biol 1973;79:351–357.
61. Abagyan RA, Totrov MM, Kuznetsov DA. ICM: a new method for
structure modeling and design: applications to docking and
structure prediction from the distorted native conformation. J
Comput Chem 1994;15:488–506.
62. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic
local alignment search tool. J Mol Biol 1990;215:403–410.
63. Altschul SF, Madden TL, Scha ¨ffer AA, Zhang J, Zhang Z, Miller
W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new genera-
tion of protein database search programs. Nucleic Acids Res
64. Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro
S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ,
Natale DA, O’Donovan C, Redaschi N, Yeh LS. The universal
protein resource (UniProt). Nucleic Acids Res 2005;33:154–159.
65. Abagyan RA, Batalov S. Do aligned sequences share the same
fold? J Mol Biol 1997;273:355–368.
66. Needleman SB, Wunsch CD. A general method applicable to the
search for similarities in the amino acid sequence of two pro-
teins. J Mol Biol 1970;48:443–453.
67. Gonnet GH, Cohen MA, Benner SA. Exhaustive matching of the
entire protein sequence database. Science 1992;256:1443–1445.
68. Abagyan R. ICM Manual v. 3.1. 2005.
69. Chakrabarti P, Janin J. Dissecting protein–protein recognition
sites. Proteins 2002;47:334–343.
70. Glaser F, Steinberg DM, Vakser IA, Ben-Tal N. Residue frequen-
cies and pairing preferences at protein–protein interfaces. Pro-
71. Samanta U, Chakrabarti P. Assessing the role of tryptophan resi-
dues in the binding site. Protein Eng 2001;14:7–15.
72. Duan G, Smith VHJ, Weaver DF. Characterization of aromatic-
thiol p-type hydrogen bonding and phenylalanine–cysteine side
chain interactions through ab initio calculations and protein
database analyses. Mol Phys 2001;99:1689–1699.
73. Pal D, Chakrabarti P. Non-hydrogen bond interactions involving
the methionine sulfur atom. J Biomol Struct Dyn 2001;19:115–
74. Bahadur RP, Chakrabarti P, Rodier F, Janin J. Dissecting subunit
interfaces in homodimeric proteins. Proteins 2003;53:708–719.
75. Zhang C, Vasmatzis G, Cornette JL, DeLisi C. Determination of
atomic desolvation energies from the structures of crystallized
proteins. J Mol Biol 1997;267:707–726.
76. Bourne Y, Watson MH, Hickey MJ, Holmes W, Rocque W, Reed
SI, Tainer JA. Crystal structure and mutational analysis of the
human CDK2 kinase complex with cell cycle-regulatory protein
CksHs1. Cell 1996;84:863–874.
77. Russo AA, Jeffrey PD, Pavletich NP. Structural basis of cyclin-
dependent kinase activation by phosphorylation. Nat Struct Biol
78. Borriello F, Krauter KS. Multiple murine a 1-protease inhibitor
genes show unusual evolutionary divergence. Proc Natl Acad Sci
79. Creighton TE, Darby NJ. Functional evolutionary divergence of
proteolytic enzymes and their inhibitors. Trends Biochem Sci
80. Rheaume C, Goodwin RL, Latimer JJ, Baumann H, Berger FG.
Evolution of murine a 1-proteinase inhibitors: gene amplification
and reactive center divergence. J Mol Evol 1994;38:121–131.
81. Goodwin RL, Baumann H, Berger FG. Patterns of divergence
during evolution of a 1-proteinase inhibitors in mammals. Mol
Biol Evol 1996;13:346–358.
82. Hill RE, Hastie ND. Accelerated evolution in the reactive centre
regions of serine protease inhibitors. Nature 1987;326:96–99.
83. Me ´ndez R, Leplae R, Lensink MF, Wodak SJ. Assessment of
CAPRI predictions in rounds 3–5 shows progress in docking pro-
cedures. Proteins 2005;60:150–169.
I. KUFAREVA ET AL.
PROTEINS: Structure, Function, and Bioinformatics
84. Henrick K, Thornton JM. PQS: a protein quaternary structure
file server. Trends Biochem Sci 1998;23:358–361.
85. Ponstingl H, Henrick K, Thornton JM. Discriminating between
homodimeric and monomeric proteins in the crystalline state.
86. Schimmel P, Tao J, Hill J. Aminoacyl tRNA synthetases as tar-
gets for new anti-infectives. FASEB J 1998;12:1599–1609.
87. Qiu X, Janson CA, Smith WW, Green SM, McDevitt P, Johanson
K, Carter P, Hibbs M, Lewis C, Chalker A, Fosberry A, Lalonde
J, Berge J, Brown P, Houge-Frydrych CS, Jarvest RL. Crystal
structure of Staphylococcus aureus tyrosyl-tRNA synthetase in
complex with a class of potent and specific inhibitors. Protein Sci
88. Kobayashi T, Takimura T, Sekine R, Vincent K, Kamata K,
Sakamoto K, Nishimura S, Yokoyama S. Structural snapshots
of the KMSKS loop rearrangement for amino acid activation by
bacterial tyrosyl-tRNA synthetase. J Mol Biol 2005;346:105–
89. Herzberg O, Chen CC, Kapadia G, McGuire M, Carroll LJ, Noh
SJ, Dunaway-Mariano D. Swiveling-domain mechanism for enzy-
matic phosphotransfer between remote reaction sites. Proc Natl
Acad Sci USA 1996;93:2652–2657.
PIER: PROTEIN INTERFACE RECOGNITION
PROTEINS: Structure, Function, and Bioinformatics