Inferred Biomolecular Interaction Server (IBIS)—a web server to analyze and predict protein interacting partners and binding sites

Article (PDF Available)inNucleic Acids Research 38(Database issue):D518-24 · October 2009with53 Reads
DOI: 10.1093/nar/gkp842 · Source: PubMed
IBIS is the NCBI Inferred Biomolecular Interaction Server. This server organizes, analyzes and predicts interaction partners and locations of binding sites in proteins. IBIS provides annotations for different types of binding partners (protein, chemical, nucleic acid and peptides), and facilitates the mapping of a comprehensive biomolecular interaction network for a given protein query. IBIS reports interactions observed in experimentally determined structural complexes of a given protein, and at the same time IBIS infers binding sites/interacting partners by inspecting protein complexes formed by homologous proteins. Similar binding sites are clustered together based on their sequence and structure conservation. To emphasize biologically relevant binding sites, several algorithms are used for verification in terms of evolutionary conservation, biological importance of binding partners, size and stability of interfaces, as well as evidence from the published literature. IBIS is updated regularly and is freely accessible via
Inferred Biomolecular Interaction Server—a web
server to analyze and predict protein interacting
partners and binding sites
Benjamin A. Shoemaker, Dachuan Zhang, Ratna R. Thangudu, Manoj Tyagi,
Jessica H. Fong, Aron Marchler-Bauer, Stephen H. Bryant, Thomas Madej* and
Anna R. Panchenko*
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health,
Bethesda, MD 20894, USA
Received August 15, 2009; Revised September 16, 2009; Accepted September 21, 2009
IBIS is the NCBI Inferred Biomolecular Interaction
Server. This server organizes, analyzes and
predicts interaction partners and locations of
binding sites in proteins. IBIS provides annotations
for different types of binding partners (protein,
chemical, nucleic acid and peptides), and facilitates
the mapping of a comprehensive biomolecular inter-
action network for a given protein query. IBIS
reports interactions observed in experimentally
determined structural complexes of a given
protein, and at the same time IBIS infers binding
sites/interacting partners by inspecting protein
complexes formed by homologous proteins.
Similar binding sites are clustered together based
on their sequence and structure conservation. To
emphasize biologically relevant binding sites,
several algorithms are used for verification in
terms of evolutionary conservation, biological
importance of binding partners, size and stability
of interfaces, as well as evidence from the published
literature. IBIS is updated regularly and is freely
accessible via
Proteins function by interacting with other biomolecules,
and a complete protein functional annotation is impossi-
ble without knowledge of the protein interactions.
Mapping biomolecular interactions is invaluable in deci-
phering the interactome, the entire set of molecular
interactions in a cell. Recent advances in the experimental
and computational tools for identifying proteins and their
complexes have spawned a wealth of information that
encourages such a mapping (1,2).
The most successful function prediction methods rely
on evolutionary relationships between proteins and the
conservation of their molecular function; they look for
sequence similarities between unknown queries and func-
tionally annotated proteins (3,4). A similar approach has
been used to infer protein interaction partners from a set
of homologous proteins, where an interaction between
two proteins is predicted if this interaction has been
observed between orthologs (interologs) in other species
(5). Homology inference methods have certain limitations,
though. Common descent does not necessarily imply
similarity in function or interactions, and annotations
transferred from one homologous protein to another
may result in incorrect functional or interolog assignment
at larger evolutionary distances (3,6–8). To verify and
guide annotations, it is often essential to detect function-
ally important binding sites. Current binding site predic-
tion methods can be subdivided into several major
categories: those which use evolutionary conservation of
binding site motifs, those which use information about a
structure of a complex, and docking methods (9).
The knowledge of protein structure may facilitate
and improve the annotation of protein function and the
characterization of protein binding partners and binding
sites. Structure-based methods use detailed knowledge of
the protein structure to identify binding sites on the basis
of the physico-chemical properties of individual residues,
their electrostatic contribution, and their location in the
3D structure (10–14). A number of servers have been
developed for predicting protein binding sites from
structures by locating the binding pockets, by identifying
sequence and structural features of homologous proteins
which are important for binding, or by using threading
and other approaches (14–22).
*To whom correspondence should be addressed. Tel: +1 301 435 5891; Fax: +1 301 480 4637; Email:
Correspondence may also be addressed to Thomas Madej. Tel: +1 301 435 5998; Fax: +1 301 480 4637; Email:
D518–D524 Nucleic Acids Research, 2010, Vol. 38, Database issue Published online 20 October 2009
Published by Oxford University Press 2009.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (
by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
We have developed a new database and server called
IBIS (Inferred Biomolecular Interaction Server), which
provides tools to investigate biomolecular interactions
observed in a given protein structure together with the
complex set of interactions inferred from its close
homologs. IBIS identifies and predicts a protein’s interac-
tion partners together with the locations of the corre-
sponding binding sites on the protein query. It does not
focus on one specific type of interacting molecule, but
provides annotations of binding sites for proteins, small
chemicals, nucleic acids and peptides (interactions with
ions are currently under development). This may allow
the mapping of a comprehensive biomolecular interaction
network for a given query, depending on the data avail-
able for its protein family.
To focus on biologically relevant binding sites, IBIS
clusters similar binding sites found in homologous
proteins based on the sites’ conservation of sequence
and structure. Binding sites which appear evolutionarily
conserved among non-redundant sets of homologous
proteins are given higher priority in the displays.
Additionally, binding site clusters are validated by
comparing them with binding site annotations from a
manually curated subset of the Conserved Domain
Database (CDD) (23), if available. In the case of
protein–protein binding sites, IBIS also compares its
findings to binding interfaces confirmed by the PISA algo-
rithm (24), which estimates the stability of protein-protein
interfaces observed in crystal structures. After binding
sites are clustered, position specific score matrices
(PSSMs) are constructed from the corresponding
binding site alignments. Together with other measures,
the PSSMs are subsequently used to rank binding sites
to assess how well they match the query, and to gauge
the biological relevance of binding sites with respect to
the query.
The current release of the Molecular Modeling Database
(MMDB) (25), an automatically parsed and validated
derivative of the Protein Data Bank (PDB) (26) hosted
by the National Center for Biotechnology Information
(NCBI) is used in this study. MMDB addresses several
issues in interpreting PDB’s 3D structure data and
provides standardized structural information. For
example, MMDB attempts to fix atom name ambiguity,
establishes chemical graphs that contain explicit bonding
information, extracts biopolymer sequences and small
molecules that get deposited into corresponding
databases, and cross-references its entries to GenBank,
PubMed, the NCBI taxonomy database and PubChem.
Defining a unit of interaction
Protein–protein interactions are identified and analyzed
on the level of domains. The Conserved Domain Search
service (CD-Search) provides domain annotation for
query sequences and pre-computed annotation for the
majority of all entries in NCBI’s Entrez Protein
database (27). If a complete protein chain is used as a
query, protein–protein interaction annotations are
provided separately for each domain identified on this
query. Interactions between protein domains may occur
on the same protein chain, not involving any other
molecule. For other types of binding partners (chemicals,
nucleic acids and peptides), interactions are defined
for a complete protein chain regardless of its domain
annotations and always involve another molecule.
One effort that is made to reduce nonbiological contacts
regards the case of a chemical that interacts with multiple
chains. If contacts to that chemical are dominated by one
of the chains (>75%), then its interactions with the other
proteins are not considered; otherwise each protein inter-
action with the chemical will be listed separately.
Defining interactions and binding site residues
An interaction and binding site is defined if a protein has
at least five residues in contact with another protein,
chemical, DNA, RNA or peptide. Contacts are defined
if any of the heavy-atom inter-atomic distances is
shorter than 4A
. The binding site is defined as a group
of residues which make a contact with a given type of
interaction partner. For protein–DNA interactions each
DNA strand is considered separately. In the case of
protein–chemical interactions, chemical ligands are all
validated and standardized (if possible) by the PubChem
databases and have explicit links to PubChem (28) which
may provide extensive information on their known biolog-
ical activities. There are two types of interactions and
binding sites recorded in IBIS: observed from experimen-
tal structures and inferred from homologs.
Clustering and Inferring binding sites
A flowchart that summarizes the inference of binding
partners and sites on a query is presented in Figure 1.
First we collect homologs with known structures and
higher than 30% identity to the query. To ensure good
quality alignments, the VAST structure–structure compar-
ison algorithm is used (29). If a query does not have a
structure, the BLAST heuristic is applied to find the
most closely related structure (30). The closest homolog
(with an E-value <0.01) is picked with a conservative
threshold for alignment extent, requiring 80% or more
of the query sequence to be aligned.
A binding site cluster represents a collection of
structures which are related to the query, and where all
members of the cluster contain similar overlapping
binding sites when mapped onto the query. Similarity
between binding sites is measured in terms of sequence
similarity, and those positions which overlap structurally
are assigned an additional weight. Binding sites are
clustered by a hierarchical complete linkage clustering
procedure. To decide on the cutoff for clustering, we use
a recently described energy function which maximizes the
mean similarity of members within a cluster and minimizes
the complexity of the description provided by cluster
membership (number of bits required to describe the
data) (31). Clusters which contain an actual interaction
observed in the query structure are marked by the letter
Nucleic Acids Research, 2010, Vol. 38, Database issue D519
‘O’. By expanding the cluster one can see additional infor-
mation about its members.
All binding site clusters are ranked in terms of their
predicted biological relevance and similarity to the
query. The components of the ranking score are the
sequence-PSSM score; the average sequence identity
between the query and cluster members calculated over
the whole structure–structure alignment; the number of
interfacial contacts and the average sequence conservation
of binding site alignment columns. All components of the
ranking score are then normalized and all clusters are
ranked with respect to the Z-scores.
Evaluating biological relevance of binding sites
To emphasize biologically relevant binding sites we
validate sites according to a few criteria. First, we assess
the evolutionary conservation of binding site clusters.
Those sites which reoccur in diverse enough protein
complexes are ranked higher, an idea which was
previously implemented in the Conserved Binding
Modes (CBM) database (32). Clusters that have only
one non-redundant member (after members with >90%
identity are purged) are considered ‘singletons’ and are
displayed at the bottom of the interaction summary
table with a low rank. Another way to evaluate binding
sites is to compare them with manually curated site
annotations from the Conserved Domain Database
(CDD), which have been extracted from the published
literature or derived from manual interpretation of indi-
vidual three-dimensional structures (23). Binding site
clusters which overlap by >50% with a CDD annotation
are ranked first. For protein–chemical interactions, we
exclude by default chemicals such as buffers, salts,
detergents, solvents and ions that are typically added
for the purpose of crystallization and/or purification.
Most often, these are not relevant with respect to the
protein’s biological function. Finally, we employ the
PISA algorithm (24) to validate protein–protein interac-
tion interfaces and eliminate those interfaces which appear
to be the result of crystal packing.
Summary statistics of the IBIS database
Currently, a total of 40 716 proteins (151 887 protein
chains/domains) are represented in IBIS with at least
one type of interaction observed in their structural
complexes. As can be seen from Figure 2, protein–
protein and protein–chemical interactions are the most
Figure 1. Overview of the binding site annotation procedure in IBIS.
D520 Nucleic Acids Research, 2010, Vol. 38, Database issue
frequent types of interactions observed in protein struc-
tures. Protein–protein interactions are the most prevalent
interactions as reflected by the number of domains
involved in interactions and the number of binding sites.
The number of inferred interactions is always higher than
the number of observed interactions, especially for
protein–peptide and protein–nucleic acid interactions,
where the number of inferred interactions exceeds the
number of observed ones (in terms of the number of
protein chains) almost 5-fold. This ratio is even higher
for binding site clusters (Figure 2B). Altogether, IBIS
provides information on binding partners and binding
site locations with averages of 3.4 protein–chemical
binding site clusters per chain, and eight protein–protein
binding site clusters per domain. The scale of such
annotations is approaching the scale of whole
Description of the IBIS interface
IBIS may be queried by supplying either a protein NCBI
GenBank identifier or PDB code (the one letter PDB chain
identifier is optional). For a given query, it is possible
to see different types of interactions, protein–protein,
protein–chemical, protein–DNA, protein–RNA and
protein–peptide, by navigating through different tabs at
the top of the page (the display of protein-ion interactions
is currently under development). Figure 3 illustrates an
IBIS Interaction Summary page. Observed and inferred
binding site clusters are sorted by the ranking score.
Each row in the table corresponds to a binding site
cluster and can be expanded to show the cluster members.
The main features of binding sites and interaction
partners in the Interaction Summary table are as follows:
‘Interaction partner’—name of the interaction partner
which interacts with either the actual query (‘observed’
interactions) or homologs of the query from within a
given binding site cluster (‘inferred’ interactions). For
protein–protein interactions, the CDD domain name of
the binding partner is listed. For protein–chemical
interactions, the column reports the name of the
chemical bound to a representative member of the
cluster. For protein–nucleic acid and protein–peptide
interactions, the column reports the sequence of the first
20 biopolymer residues from the interaction partner of a
representative cluster member.
‘Ranking score’—the score which ranks the binding site
clusters in terms of their biological relevance and similar-
ity to the query. The ranking score is not defined for the
‘singleton’ clusters.
‘Number of cluster members’—the number of cluster
members. Upon cluster expansion only non-redundant
cluster members are displayed (at <90% identity level).
A complete list of members can also be viewed by
clicking the ‘See all members’ link.
‘Average percent identity to query’—the average
sequence identity between the query and the cluster
members calculated over all of their structural alignments
with the query.
‘Number of binding site residues’—the union of binding
sites mapped from all members of the cluster to the query.
‘Number of chemicals’ (for protein–chemical inter-
actions)—the number of unique, standardized chemicals
present in a given binding site cluster.
‘Curator annotation’—binding site annotation from the
CDD which overlaps by >50% with the sites annotated by
IBIS. Binding site clusters with matching CDD annotation
are top-ranked irrespective of their ranking score.
‘Taxonomic diversity’—the last common ancestor of the
proteins from a given cluster, listed with a link to NCBI’s
Taxonomy Browser, so that one can explore all taxonomic
groups represented by the cluster.
The actual binding site residue alignment can be seen
upon expanding the clusters, including the PDB codes cor-
responding to all complex structures summarized by the
clusters. It is also possible to view the inferred binding
sites projected onto the actual query structure using the
Cn3D visualization software (http://www.ncbi.nlm.nih
.gov/Structure/CN3D/cn3d.shtml). For the case of
Figure 2. (A) Histogram depicting the number of proteins in PDB with
observed/inferred binding sites. (B) Histogram showing the number of
binding sites inferred by IBIS as compared to those observed in protein
structure complexes.
Nucleic Acids Research, 2010, Vol. 38, Database issue D521
protein–protein interactions, the expanded table will
provide the PISA validation status for each interaction
interface. PISA may not be able to process a particular
complex structure; these cases are indicated by an ‘N/A’
The features of binding site clusters can be examined
by using the ‘Advanced search’ option found on the left
side bar. This option allows one to filter the interactions
within a given interaction type by various criteria like
level of sequence identity, structural similarity, names of
interacting partner and others. In the case of chemical
binding sites, for example, it is possible to pick and
inspect various sites a particular chemical may bind to
on a given query.
Annotating new binding sites using IBIS: example
of human spleen tyrosine kinase catalytic domain
Spleen tyrosine kinase (Syk) is a non-receptor tyrosine
kinase, expressed in a wide range of cell types, which
plays an important role in immunoreceptor signaling
(33). It is an attractive drug target for the treatment of
allergic and antibody mediated autoimmune diseases,
breast and gastric cancers. Syk is characterized by two
N-terminal SH2 adapter domains, a linker region and a
C-terminal catalytic domain. Several drugs/inhibitors
target the active site of the Syk catalytic domain and
decrease its activity.
Here, we demonstrate how IBIS can be used to annotate
the binding sites of the Syk catalytic domain. We start
with a Syk sequence for which a structure of the
complex with the ligands is available (pdb code: 1XBB);
we predict binding sites using IBIS, and finally compare
predicted sites with the actual binding sites observed in the
structure. First we find the closest homolog with a known
structure, a Zap-70 kinase (1U59 Chain A; Blast E-value
of 6e-99 and 77% identity to the query sequence,
Figure 2). Second, we use the structure of 1U59 as a
query in IBIS and find nine protein–chemical binding
site clusters. The top two clusters overlap with the
‘active site/ATP binding site’ CDD annotations. The
first binding site cluster includes 360 homologous
structures bound to 170 different chemicals. The consen-
sus binding site alignment is 65 residues long, due to the
diversity and size variation of the chemicals bound, but it
highlights 13 highly conserved residues. The ATP-binding
site represents an attractive target for the design of kinase
inhibitors, and IBIS provides a concise summary of
interactions at that site, which would otherwise require
Figure 3. IBIS screen shot for 1U59, Chain A, displaying various chemical binding sites inferred from its homologs. A blowup of the expanded
cluster of the ATP binding site is also shown.
D522 Nucleic Acids Research, 2010, Vol. 38, Database issue
significant comparative analysis. Here IBIS groups
and identifies an ATP-binding site, and provides a list of
various chemicals, among them many kinase inhibitors,
which might potentially bind to and inhibit the query
protein. All binding sites observed in the actual structure
complex with the anticancer drug imatinib (1XBB)
are correctly annotated by IBIS (see table in Figure 4).
Interestingly, imatinib binds not only to the ATP-
binding site but also to a regulatory myristoylation site
on the C-terminus (from the binding site cluster #8) that
can be annotated on the query sequence.
In addition to chemical binding sites, it is also possible
to predict protein interaction partners for the Syk protein.
For example, binding site cluster #1 under protein–protein
interactions points to a potential SH2 domain binding
site which is further validated by CDD curator annota-
tion, although no structural complexes have been solved
between Syk and SH2.
In this paper, we presented a comprehensive, web-
accessible database, which organizes, analyzes and
predicts different types of interaction partners and
binding sites in proteins. For proteins with or without
known binding partners, IBIS provides a succinct and
informative representation of observed binding sites and
binding sites inferred from homologs with known 3D
structure. It provides analysis of how well a binding site
is conserved across members of a homologous protein
family. Several structures of the same protein or close
homologs with different binding partners may be available
in the Protein Data Bank, or the same protein may have
been crystallized under different physiological conditions.
In such cases, the IBIS database facilitates a detailed
classification and analysis of binding sites. IBIS also
attempts to validate binding sites by assessing their bio-
logical relevance and ranks them accordingly. It can be
used to annotate oligomeric states by inferring relevant
homo-oligomer interfaces and should prove useful in
studying the evolution of protein interactions.
IBIS is updated regularly (currently on a biweekly
schedule) to account for the growth of the GenBank,
PDB/MMDB, VAST and CDD databases. Recently, it
was estimated that almost half of all sequences in the
GenBank database have at least one structure homolog
with an extensive alignment and at least 30% identical
residues (34). As the on-going structural genomics initia-
tive continues to close the sequence-structure gap, IBIS
serves as a powerful knowledge-based annotation system
for proteins of unknown structure.
The authors would like to thank Yanli Wang and Lewis
Geer for useful discussions and Eugene Krissinel for help
with the PISA software.
National Institutes of Health/DHHS (Intramural
Research program of the National Library of Medicine).
Funding for open access charge: National Institutes of
Figure 4. Mapping of the 1U59 inferred ATP binding site onto the sequence of Syk tyrosine kinase (1XBB chain A) and its agreement with the
observed binding site in Syk + complex with imatinib. MMDB residue numbering is used which starts from the beginning of the corresponding
GenBank protein sequence.
Nucleic Acids Research, 2010, Vol. 38, Database issue D523
Health/DHHS (Intramural Research program of the
National Library of Medicine).
Conflict of interest statement. None declared.
1. Giot,L., Bader,J.S., Brouwer,C., Chaudhuri,A., Kuang,B., Li,Y.,
Hao,Y.L., Ooi,C.E., Godwin,B., Vitols,E. et al. (2003) A protein
interaction map of Drosophila melanogaster. Science, 302,
2. Li,S., Armstrong,C.M., Bertin,N., Ge,H., Milstein,S., Boxem,M.,
Vidalain,P.O., Han,J.D., Chesneau,A., Hao,T. et al. (2004) A map
of the interactome network of the metazoan C. elegans. Science,
303, 540–543.
3. Bork,P. and Koonin,E.V. (1998) Predicting functions from protein
sequences—where are the bottlenecks? Nat. Genet., 18, 313–318.
4. Rentzsch,R. and Orengo,C.A. (2009) Protein function prediction—
the power of multiplicity. Trends Biotechnol., 27, 210–219.
5. Matthews,L.R., Vaglio,P., Reboul,J., Ge,H., Davis,B.P., Garrels,J.,
Vincent,S. and Vidal,M. (2001) Identification of potential
interaction networks using sequence-based searches for conserved
protein-protein interactions or ‘interologs’. Genome Res., 11,
6. Gerlt,J.A. and Babbitt,P.C. (2000) Can sequence determine
function? Genome Biol., 1, REVIEWS0005.
7. Yu,H., Luscombe,N.M., Lu,H.X., Zhu,X., Xia,Y., Han,J.D.,
Bertin,N., Chung,S., Vidal,M. and Gerstein,M. (2004) Annotation
transfer between genomes: protein-protein interologs and protein-
DNA regulogs. Genome Res., 14, 1107–1118.
8. Hegyi,H. and Gerstein,M. (1999) The relationship between protein
structure and function: a comprehensive survey with application to
the yeast genome. J. Mol. Biol., 288, 147–164.
9. Campbell,S.J., Gold,N.D., Jackson,R.M. and Westhead,D.R.
(2003) Ligand binding: functional site location, similarity and
docking. Curr. Opin. Struct. Biol., 13, 389–395.
10. Jones,S. and Thornton,J.M. (1997) Analysis of protein-protein
interaction sites using surface patches. J. Mol. Biol., 272, 121–232.
11. Teichmann,S.A., Murzin,A.G. and Chothia,C. (2001)
Determination of protein function, evolution and interactions by
structural genomics. Curr. Opin. Struct. Biol., 11 , 354–363.
12. Landgraf,R., Xenarios,I. and Eisenberg,D. (2001) Three-
dimensional cluster analysis identifies interfaces and functional
residue clusters in proteins. J. Mol. Biol., 307, 1487–1502.
13. Pazos,F. and Sternberg,M.J. (2004) Automated prediction of
protein function and detection of functional sites from structure.
Proc. Natl Acad. Sci. USA, 101, 14754–14759.
14. Brylinski,M. and Skolnick,J. (2008) A threading-based method
(FINDSITE) for ligand-binding site prediction and functional
annotation. Proc. Natl Acad. Sci. USA, 105, 129–134.
15. Hernandez,M., Ghersi,D. and Sanchez,R. (2009)
SITEHOUND-web: a server for ligand binding site identification
in protein structures. Nucleic Acids Res., 37, W413–W416.
16. Huang,B. and Schroeder,M. (2006) LIGSITEcsc: predicting ligand
binding sites using the Connolly surface and degree of conservation.
BMC Struct Biol., 6
, 19.
17. Laurie,A.T. and Jackson,R.M. (2005) Q-SiteFinder: an energy-
based method for the prediction of protein-ligand binding sites.
Bioinformatics, 21, 1908–1916.
18. Qin,S. and Zhou,H.X. (2007) meta-PPISP: a meta web server for
protein-protein interaction site prediction. Bioinformatics, 23,
19. Talavera,D., Laskowski,R.A. and Thornton,J.M. (2009) WSsas:
a web service for the annotation of functional residues through
structural homologues. Bioinformatics, 25, 1192–1194.
20. Snyder,K.A., Feldman,H.J., Dumontier,M., Salama,J.J. and
Hogue,C.W. (2006) Domain-based small molecule binding site
annotation. BMC Bioinformatics, 7, 152.
21. Chen,Y.C., Lo,Y.S., Hsu,W.C. and Yang,J.M. (2007) 3D-partner:
a web server to infer interacting partners and binding models.
Nucleic Acids Res., 35, W561–567.
22. Stein,A., Panjkovich,A. and Aloy,P. (2009) 3did Update: domain-
domain and peptide-mediated interactions of known 3D structure.
Nucleic Acids Res., 37, D300–D304.
23. Marchler-Bauer,A., Anderson,J.B., Chitsaz,F., Derbyshire,M.K.,
DeWeese-Scott,C., Fong,J.H., Geer,L.Y., Geer,R.C.,
Gonzales,N.R., Gwadz,M. et al. (2009) CDD: specific functional
annotation with the Conserved Domain Database. Nucleic Acids
Res., 37, D205–210.
24. Krissinel,E. and Henrick,K. (2007) Inference of macromolecular
assemblies from crystalline state. J. Mol. Biol., 372, 774–797.
25. Chen,J., Anderson,J.B., DeWeese-Scott,C., Fedorova,N.D.,
Geer,L.Y., He,S., Hurwitz,D.I., Jackson,J.D., Jacobs,A.R.,
Lanczycki,C.J. et al. (2003) MMDB: Entrez’s 3D-structure
database. Nucleic Acids Res. , 31, 474–477.
26. Sussman,J.L., Lin,D., Jiang,J., Manning,N.O., Prilusky,J., Ritter,O.
and Abola,E.E. (1998) Protein Data Bank (PDB): database of
three-dimensional structural information of biological
macromolecules. Acta Crystallogr. D Biol. Crystallogr., 54,
27. Marchler-Bauer,A. and Bryant,S.H. (2004) CD-Search: protein
domain annotations on the fly. Nucleic Acids Res., 32, W327–W331.
28. Wang,Y., Xiao,J., Suzek,T.O., Zhang,J., Wang,J. and Bryant,S.H.
(2009) PubChem: a public information system for analyzing
bioactivities of small molecules. Nucleic Acids Res., 37,
29. Gibrat,J.F., Madej,T. and Bryant,S.H. (1996) Surprising similarities
in structure comparison. Curr. Opin. Struct. Biol., 6, 377–385.
30. Wang,Y., Bryant,S., Tatusov,R. and Tatusova,T. (2000) Links from
genome proteins to known 3-D structures. Genome Res., 10
31. Slonim,N., Atwal,G.S., Tkacik,G. and Bialek,W. (2005)
Information-based clustering. Proc. Natl Acad. Sci. USA, 102,
32. Shoemaker,B.A., Panchenko,A.R. and Bryant,S.H. (2006) Finding
biologically relevant protein domain interactions: conserved binding
mode analysis. Protein Sci., 15, 352–361.
33. Atwell,S., Adams,J.M., Badger,J., Buchanan,M.D., Feil,I.K.,
Froning,K.J., Gao,X., Hendle,J., Keegan,K., Leon,B.C. et al.
(2004) A novel mode of Gleevec binding is revealed by the structure
of spleen tyrosine kinase. J. Biol. Chem., 279, 55827–55832.
34. Wang,Y., Addess,K.J., Chen,J., Geer,L.Y., He,J., He,S., Lu,S.,
Madej,T., Marchler-Bauer,A., Thiessen,P.A. et al. (2007) MMDB:
annotating protein sequences with Entrez’s 3D-structure database.
Nucleic Acids Res., 35, D298–D300.
D524 Nucleic Acids Research, 2010, Vol. 38, Database issue
    • "Importantly, structure evolves more slowly than sequence and may have more powerful signals for conservation [184,185] . Predictors using structural homology have verified this proposi- tion [15,21,31,32,66,162,186]. The main issue with the use of structural homologs, or with predictors requiring structural information in general , is the paucity of usable structures [15,47,105,187], particularly when considering the relatively small size of the PDB (∼80,000 structures, including redundancy) compared to the number of sequences known (∼17 million non-redundant sequences) [36,188], though this can be partly circumvented by using local (rather than global) structural homologies [34]. "
    [Show abstract] [Hide abstract] ABSTRACT: Interaction sites on protein surfaces mediate virtually all biological activities, and their identification holds promise for disease treatment and drug design. Novel algorithmic approaches for the prediction of these sites have been produced at a rapid rate, and the field has seen significant advancement over the past decade. However, the most current methods have not yet been reviewed in a systematic and comprehensive fashion. Herein, we describe the intricacies of the biological theory, datasets, and features required for modern protein-protein interaction site (PPIS) prediction, and present an integrative analysis of the state-of-the-art algorithms and their performance. First, the major sources of data used by predictors are reviewed, including training sets, evaluation sets, and methods for their procurement. Then, the features employed and their importance in the biological characterization of PPISs are explored. This is followed by a discussion of the methodologies adopted in contemporary prediction programs, as well as their relative performance on the datasets most recently used for evaluation. In addition, the potential utility that PPIS identification holds for rational drug design, hotspot prediction, and computational molecular docking is described. Finally, an analysis of the most promising areas for future development of the field is presented.
    Full-text · Article · Dec 2015
    • "The ConSurf server result (Figure 7) shows that the N-terminal of the ARID domain is one of the most highly-conserved parts in the ARID domain, which is probably essential for protein's function. We also predict the protein interacting partners and binding sites in the KDM5C ARID domain using the NCBI Inferred Biomolecular Interactions Server(IBIS) [37]. The results show that Asp87 is a plausible zinc ion binding site. "
    [Show abstract] [Hide abstract] ABSTRACT: Mutations in KDM5C gene are linked to X-linked mental retardation, the syndromic Claes-Jensen-type disease. This study focuses on non-synonymous mutations in the KDM5C ARID domain and evaluates the effects of two disease-associated missense mutations (A77T and D87G) and three not-yet-classified missense mutations (R108W, N142S, and R179H). We predict the ARID domain's folding and binding free energy changes due to mutations, and also study the effects of mutations on protein dynamics. Our computational results indicate that A77T and D87G mutants have minimal effect on the KDM5C ARID domain stability and DNA binding. In parallel, the change in the free energy unfolding caused by the mutants A77T and D87G were experimentally measured by urea-induced unfolding experiments and were shown to be similar to the in silico predictions. The evolutionary conservation analysis shows that the disease-associated mutations are located in a highly-conserved part of the ARID structure (N-terminal domain), indicating their importance for the KDM5C function. N-terminal residues' high conservation suggests that either the ARID domain utilizes the N-terminal to interact with other KDM5C domains or the N-terminal is involved in some yet unknown function. The analysis indicates that, among the non-classified mutations, R108W is possibly a disease-associated mutation, while N142S and R179H are probably harmless.
    Full-text · Article · Nov 2015
    • "The active site of an enzyme is typically found in a large pocket on the protein surface [16], which allows a ligand substrate to bind in a solvent-free environment. The increasing amount of active site and catalytic residue data, available in public resources (e.g., the CSA [12], Inferred Biomolecular Interactions Server [17], firestar [18]), has enabled large-scale studies on the location of catalytic sites. As regards the arrangements of catalytic residues in active sites, many studies have shown that although catalytic residues tend to be conserved in their structural location, they are not necessarily conserved in sequence. "
    [Show abstract] [Hide abstract] ABSTRACT: Enzymes, as biological catalysts, form the basis of all forms of life. How these proteins have evolved their functions remains a fundamental question in biology. Over 100years of detailed biochemistry studies, combined with the large volumes of sequence and protein structural data now available, means we are able to perform large-scale analyses to address this question. Using a range of computational tools and resources we have compiled information on all experimentally annotated changes in enzyme function within 379 structurally defined protein domain superfamilies, linking the changes observed in functions during evolution, to changes in reaction chemistry. Many superfamilies show changes in function at some level, although one function often dominates one superfamily. We use quantitative measures of changes in reaction chemistry to reveal the various types of chemical changes occurring during evolution and exemplify these by detailed examples. Additionally, we use structural information of the enzymes active site to examine how different superfamilies have changed their catalytic machinery during evolution. Some superfamilies have changed the reactions they perform without changing catalytic machinery. In others large changes of enzyme function, both in terms of overall chemistry and substrate specificity, have been brought about by significant changes in catalytic machinery. Interestingly, in some superfamilies' relatives perform similar functions but with different catalytic machineries. This analysis highlights characteristics of functional evolution across a wide range of superfamilies', providing insights that will be useful in predicting the function of uncharacterized sequences as well as the design of new synthetic enzymes.
    Full-text · Article · Nov 2015
Show more