-
-
-
-
-
[show abstract]
[hide abstract]
ABSTRACT: Characterization of binding hot spots of protein interfaces is a fundamental study in molecular biology. Many computational methods have been proposed to identify binding hot spots. However, there are few studies to assess the biological significance of binding hot spots. We introduce the notion of biological significance of a contact residue for capturing the probability of the residue occurring in or contributing to protein binding interfaces. We take a statistical Z-score approach to the assessment of the biological significance. The method has three main steps. First, the potential score of a residue is defined by using a knowledge-based potential function with relative accessible surface area calculations. A null distribution of this potential score is then generated from artifact crystal packing contacts. Finally, the Z-score significance of a contact residue with a specific potential score is determined according to this null distribution. We hypothesize that residues at binding hot spots have big absolute values of Z-score as they contribute greatly to binding free energy. Thus, we propose to use Z-score to predict whether a contact residue is a hot spot residue. Comparison with previously reported methods on two benchmark datasets shows that this Z-score method is mostly superior to earlier methods. This article is part of a Special Issue entitled: Computational Methods for Protein Interaction and Structural Prediction.
Biochimica et Biophysica Acta 06/2012; 1824(12):1457-67. · 4.66 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Sequence-based understanding and identification of protein binding interfaces is a challenging research topic due to the complexity in protein systems and the imbalanced distribution between interface and noninterface residues. This paper presents an outlier detection idea to address the redundancy problem in protein interaction data. The cleaned training data are then used for improving the prediction performance. We use three novel measures to describe the extent a residue is considered as an outlier in comparison to the other residues: the distance of a residue instance from the center instance of all residue instances of the same class label (Dist), the probability of the class label of the residue instance (PCL), and the importance of within-class and between-class (IWB) residue instances. Outlier scores are computed by integrating the three factors; instances with a sufficiently large score are treated as outliers and removed. The data sets without outliers are taken as input for a support vector machine (SVM) ensemble. The proposed SVM ensemble trained on input data without outliers performs better than that with outliers. Our method is also more accurate than many literature methods on benchmark data sets. From our empirical studies, we found that some outlier interface residues are truly near to noninterface regions, and some outlier noninterface residues are close to interface regions.
IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 04/2012; 9(4):1155-65. · 2.25 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Water is an integral part of protein complexes. It shapes protein binding sites by filling cavities and it bridges local contacts by hydrogen bonds. However, water molecules are usually not included in protein interface models in the past, and few distribution profiles of water molecules in protein binding interfaces are known.
In this work, we use a tripartite protein-water-protein interface model and a nested-ring atom re-organization method to detect hydration trends and patterns from an interface data set which involves immobilized interfacial water molecules. This data set consists of 206 obligate interfaces, 160 non-obligate interfaces, and 522 crystal packing contacts. The two types of biological interfaces are found to be drier than the crystal packing interfaces in our data, agreeable to a hydration pattern reported earlier although the previous definition of immobilized water is pure distance-based. The biological interfaces in our data set are also found to be subject to stronger water exclusion in their formation. To study the overall hydration trend in protein binding interfaces, atoms at the same burial level in each tripartite protein-water-protein interface are organized into a ring. The rings of an interface are then ordered with the core atoms placed at the middle of the structure to form a nested-ring topology. We find that water molecules on the rings of an interface are generally configured in a dry-core-wet-rim pattern with a progressive level-wise solvation towards to the rim of the interface. This solvation trend becomes even sharper when counterexamples are separated.
Immobilized water molecules are regularly organized in protein binding interfaces and they should be carefully considered in the studies of protein hydration mechanisms.
BMC Bioinformatics 03/2012; 13:51. · 2.75 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: A multi-interface domain is a domain that can shape multiple and distinctive binding sites to contact with many other domains, forming a hub in domain-domain interaction networks. The functions played by the multiple interfaces are usually different, but there is no strict bijection between the functions and interfaces as some subsets of the interfaces play the same function. This work applies graph theory and algorithms to discover fingerprints for the multiple interfaces of a domain and to establish associations between the interfaces and functions, based on a huge set of multi-interface proteins from PDB. We found that about 40% of proteins have the multi-interface property, however the involved multi-interface domains account for only a tiny fraction (1.8%) of the total number of domains. The interfaces of these domains are distinguishable in terms of their fingerprints, indicating the functional specificity of the multiple interfaces in a domain. Furthermore, we observed that both cooperative and distinctive structural patterns, which will be useful for protein engineering, exist in the multiple interfaces of a domain.
PLoS ONE 01/2012; 7(12):e50821. · 4.09 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Context-awareness is a characteristic in the recognition between antigens and antibodies, highlighting the reconfiguration of epitope residues when an antigen interacts with a different antibody. A coarse binary classification of antigen regions into epitopes, or nonepitopes without specifying antibodies may not accurately reflect this biological reality. Therefore, we study an antibody-specified epitope prediction problem in line with this principle. This problem is new and challenging as we pinpoint a subset of the antigenic residues from an antigen when it binds to a specific antibody. We introduce two kinds of associations of the contextual awareness: 1) residues-residues pairing preference, and 2) the dependence between sets of contact residue pairs. Preference plays a bridging role to link interacting paratope and epitope residues while dependence is used to extend the association from one-dimension to two-dimension. The paratope/epitope residues' relative composition, cooperativity ratios, and Markov properties are also utilized to enhance our method. A nonredundant data set containing 80 antibody-antigen complexes is compiled and used in the evaluation. The results show that our method yields a good performance on antibody-specified epitope prediction. On the traditional antibody-ignored epitope prediction problem, a simplified version of our method can produce a competitive, sometimes much better, performance in comparison with three structure-based predictors.
IEEE/ACM Transactions on Computational Biology and Bioinformatics 01/2012; · 1.54 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: The conservation of interfacial water molecules has only been studied in small data sets consisting of interfaces of a specific function. So far, no general conclusions have been drawn from large-scale analysis, due to the challenges of using structural alignment in large data sets. To avoid using structural alignment, we propose a solvated sequence method to analyse water conservation properties in protein interfaces. We first use water information to label the residues, and then align interfacial residues in a fashion similar to normal sequence alignment. Our results show that, for a water-contacting interfacial residue, substituting it into hydrophobic residues tends to desolvate the local area. Surprisingly, residues with short side chains also tend not to lose their contacting water, emphasising the role of water in shaping binding sites. Deeply buried water molecules are found more conserved in terms of their contacts with interfacial residues.
International Journal of Bioinformatics Research and Applications 01/2012; 8(3):228-44.
-
[show abstract]
[hide abstract]
ABSTRACT: Worldwide and substantial mortality caused by the 2009 H1N1 influenza A has stimulated a new surge of research on H1N1 viruses. An epitope conservation has been learned in the HA1 protein that allows antibodies to cross-neutralize both 1918 and 2009 H1N1. However, few works have thoroughly studied the binding hot spots in those two antigen-antibody interfaces which are responsible for the antibody cross-neutralization.
We apply predictive methods to identify binding hot spots at the epitope sites of the HA1 proteins and at the paratope sites of the 2D1 antibody. We find that the six mutations at the HA1's epitope from 1918 to 2009 should not harm its binding to 2D1. Instead, the change of binding free energy on the whole exhibits an increased tendency after these mutations, making the binding stronger. This is consistent with the observation that the 1918 H1N1 neutralizing antibody can cross-react with 2009 H1N1. We identified three distinguished hot spot residues, including Lys(166), common between the two epitopes. These common hot spots again can explain why 2D1 cross-reacted. We believe that these hot spot residues are mutation candidates which may help H1N1 viruses to evade the immune system. We also identified eight residues at the paratope site of 2D1, five from its heavy chain and three from its light chain, that are predicted to be energetically important in the HA1 recognition. The identification of these hot spot residues and their structural analysis are potentially useful to fight against H1N1 viruses.
jinyan.li@uts.edu.au
Z-score is available at http://155.69.2.25/liuqian/indexz.py
Supplementary data are available at Bioinformatics online.
Bioinformatics 07/2011; 27(18):2529-36. · 5.47 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: We introduce a new motif called interfacial biclique pattern to study the difference between double-stranded DNA-binding proteins (DSBs, most of them also known to play the role as transcriptional factors) and single-stranded DNA-binding proteins (SSBs) which are found to involve in many applications recently. An interfacial biclique pattern in a protein-DNA complex usually consists of a group of residues and a group of nucleotides such that every residue has a contact to all of the bases. The proposal of this idea is based on a biological redundancy mechanism that: a site mutation has little influence on the other residues to recognize the target nucleotides and vice versa. The distribution of the residues on the interfacial motifs is investigated to identify distinct stable preferred residues, stable un-preferred residues and unstable preferred residues between SSBs and DSBs. We also examine residue co-occurrence and residue-base association rules in the interfacial motifs to uncover the different choices of residue combinations by SSBs and DSBs that have contacts with one or more bases. We found that DSBs and SSBs have their own right residues at the right places for the binding preference and association with nucleotides. Some of our results can be supported by literature work.
Proteins Structure Function and Bioinformatics 02/2011; 79(2):598-610. · 3.39 Impact Factor
-
Inf. Sci. 01/2011; 181:201-216.
-
[show abstract]
[hide abstract]
ABSTRACT: A protein binding hot spot is a cluster of residues in the interface that are energetically important for the binding of the protein with its interaction partner. Identifying protein binding hot spots can give useful information to protein engineering and drug design, and can also deepen our understanding of protein-protein interaction. These residues are usually buried inside the interface with very low solvent accessible surface area (SASA). Thus SASA is widely used as an outstanding feature in hot spot prediction by many computational methods. However, SASA is not capable of distinguishing slightly buried residues, of which most are non hot spots, and deeply buried ones that are usually inside a hot spot.
We propose a new descriptor called "burial level" for characterizing residues, atoms and atomic contacts. Specifically, burial level captures the depth the residues are buried. We identify different kinds of deeply buried atomic contacts (DBAC) at different burial levels that are directly broken in alanine substitution. We use their numbers as input for SVM to classify between hot spot or non hot spot residues. We achieve F measure of 0.6237 under the leave-one-out cross-validation on a data set containing 258 mutations. This performance is better than other computational methods.
Our results show that hot spot residues tend to be deeply buried in the interface, not just having a low SASA value. This indicates that a high burial level is not only a necessary but also a more sufficient condition than a low SASA for a residue to be a hot spot residue. We find that those deeply buried atoms become increasingly more important when their burial levels rise up. This work also confirms the contribution of deeply buried interfacial atomic contacts to the energy of protein binding hot spot.
BMC Systems Biology 01/2011; 5 Suppl 1:S5. · 3.15 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: A protein interface can be as "wet" as a protein surface in terms of the number of immobilized water molecules. This important water information has not been explicitly taken by computational methods to model and identify protein binding hot spots, overlooking the water role in forming interface hydrogen bonds and in filing cavities. Hot spot residues are usually clustered at the core of the protein binding interfaces. However, traditional machine learning methods often identify the hot spot residues individually, breaking the cooperativity of the energetic contribution. Our idea in this work is to explore the role of immobilized water and meanwhile to capture two essential properties of hot spots: the compactness in contact and the far distance from bulk solvent. Our model is named geometrically centered region (GCR). The detection of GCRs is based on novel tripartite graphs, and atom burial levels which are a concept more intuitive than SASA. Applying to a data set containing 355 mutations, we achieved an F measure of 0.6414 when ΔΔG ≥ 1.0 kcal/mol was used to define hot spots. This performance is better than Robetta, a benchmark method in the field. We found that all but only one of the GCRs contain water to a certain degree, and most of the outstanding hot spot residues have water-mediated contacts. If the water is excluded, the burial level values are poorly related to the ΔΔG, and the model loses its performance remarkably. We also presented a definition for the O-ring of a GCR as the set of immediate neighbors of the residues in the GCR. Comparative analysis between the O-rings and GCRs reveals that the newly defined O-ring is indeed energetically less important than the GCR hot spot, confirming a long-standing hypothesis.
Proteins Structure Function and Bioinformatics 12/2010; 78(16):3304-16. · 3.39 Impact Factor
-
Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, July 25-28, 2010; 01/2010
-
[show abstract]
[hide abstract]
ABSTRACT: Thymine is the only nucleotide base which is changed to uracil upon transcription, leaving mRNA less hydrophobic compared to its DNA counterpart. All the 16 codons that contain uracil (or thymine in gene) as the second nucleotide code for the five large hydrophobic residues (LHRs), namely phenylalanine,v isoleucine, leucine, methionine and valine. Thymine content (i.e. the fraction of XTX codons, where X = A, C, G, or T) in PINK1 mRNA sequences and its relationship with protein stability and function are the focus of this work. This analysis will shed light on PINK1's stability, thus a clue can be provided to understand the mitochondrial dysfunction and the failure of oxidative stress control frequently observed in Parkinson's disease. We obtained the complete PINK1 mRNA sequences of 8 different species. The distributions of XTX codons in different frames are calculated. We observed that the thymine content reached the highest level in the coding frame 1 of the PINK1 mRNA sequence of Bos Taurus (Bt), that is peaked at 27%. Coding frame 1 containing low thymine leads to the reduction in LHRs in the corresponding proteins. Therefore, we conjecture that proteins from the other organisms, including Homo sapiens, lost some of their hydrophobicity and became susceptible to dysfunction. Genes such as PINK1 have reduced thymine in the evolutionary process thereby making their protein products potentially being susceptible to instability and causing disease. Adding more hydrophobic residues (thymine) at appropriate places might help conserve important biological functions.
Bioinformation 01/2010; 4(10):452-5.
-
[show abstract]
[hide abstract]
ABSTRACT: A protein binding hot spot is a small cluster of residues tightly packed at the center of the interface between two interacting proteins. Though a hot spot constitutes a small fraction of the interface, it is vital to the stability of protein complexes. Recently, there are a series of hypotheses proposed to characterize binding hot spots, including the pioneering O-ring theory, the insightful 'coupling' and 'hot region' principle, and our 'double water exclusion' (DWE) hypothesis. As the perspective changes from the O-ring theory to the DWE hypothesis, we examine the physicochemical properties of the binding hot spots under the new hypothesis and compare with those under the O-ring theory.
The requirements for a cluster of residues to form a hot spot under the DWE hypothesis can be mathematically satisfied by a biclique subgraph if a vertex is used to represent a residue, an edge to indicate a close distance between two residues, and a bipartite graph to represent a pair of interacting proteins. We term these hot spots as DWE bicliques. We identified DWE bicliques from crystal packing contacts, obligate and non-obligate interactions. Our comparative study revealed that there are abundant unique bicliques to the biological interactions, indicating specific biological binding behaviors in contrast to crystal packing. The two sub-types of biological interactions also have their own signature bicliques. In our analysis on residue compositions and residue pairing preferences in DWE bicliques, the focus was on interaction-preferred residues (ipRs) and interaction-preferred residue pairs (ipRPs). It is observed that hydrophobic residues are heavily involved in the ipRs and ipRPs of the obligate interactions; and that aromatic residues are in favor in the ipRs and ipRPs of the biological interactions, especially in those of the non-obligate interactions. In contrast, the ipRs and ipRPs in crystal packing are dominated by hydrophilic residues, and most of the anti-ipRs of crystal packing are the ipRs of the obligate or non-obligate interactions.
These ipRs and ipRPs in our DWE bicliques describe a diverse binding features among the three types of interactions. They also highlight the specific binding behaviors of the biological interactions, sharply differing from the artifact interfaces in the crystal packing. It can be noted that DWE bicliques, especially the unique bicliques, can capture deep insights into the binding characteristics of protein interfaces.
BMC Bioinformatics 01/2010; 11:244. · 2.75 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Predicting B-cell epitopes is very important for designing vaccines and drugs to fight against the infectious agents. However, due to the high complexity of this problem, previous prediction methods that focus on linear and conformational epitope prediction are both unsatisfactory. In addition, antigen interacting with antibody is context dependent and the coarse binary classification of antigen residues into epitope and non-epitope without the corresponding antibody may not reveal the biological reality. Therefore, we take a novel way to identify epitopes by using associations between antibodies and antigens.
Given a pair of antibody-antigen sequences, the epitope residues can be identified by two types of associations: paratope-epitope interacting biclique and cooccurrent pattern of interacting residue pairs. As the association itself does not include the neighborhood information on the primary sequence, residues' cooperativity and relative composition are then used to enhance our method. Evaluation carried out on a benchmark data set shows that the proposed method produces very good performance in terms of accuracy. After compared with other two structure-based B-cell epitope prediction methods, results show that the proposed method is competitive to, sometimes even better than, the structure-based methods which have much smaller applicability scope.
The proposed method leads to a new way of identifying B-cell epitopes. Besides, this antibody-specified epitope prediction can provide more precise and helpful information for wet-lab experiments.
BMC Structural Biology 01/2010; 10 Suppl 1:S6. · 2.48 Impact Factor
-
Computational Intelligence. 01/2010; 26:282-317.