PIER: Protein interface recognition for structural proteomics

Scripps Research Institute, La Jolla, California 92037, USA.
Proteins Structure Function and Bioinformatics (Impact Factor: 2.63). 05/2007; 67(2):400-17. DOI: 10.1002/prot.21233
Source: PubMed


Recent advances in structural proteomics call for development of fast and reliable automatic methods for prediction of functional surfaces of proteins with known three-dimensional structure, including binding sites for known and unknown protein partners as well as oligomerization interfaces. Despite significant progress the problem is still far from being solved. Most existing methods rely, at least partially, on evolutionary information from multiple sequence alignments projected on protein surface. The common drawback of such methods is their limited applicability to the proteins with a sparse set of sequential homologs, as well as inability to detect interfaces in evolutionary variable regions. In this study, the authors developed an improved method for predicting interfaces from a single protein structure, which is based on local statistical properties of the protein surface derived at the level of atomic groups. The proposed Protein IntErface Recognition (PIER) method achieved the overall precision of 60% at the recall threshold of 50% at the residue level on a diverse benchmark of 490 homodimeric, 62 heterodimeric, and 196 transient interfaces (compared with 25% precision at 50% recall expected from random residue function assignment). For 70% of proteins in the benchmark, the binding patch residues were successfully detected with precision exceeding 50% at 50% recall. The calculation only took seconds for an average 300-residue protein. The authors demonstrated that adding the evolutionary conservation signal only marginally influenced the overall prediction performance on the benchmark; moreover, for certain classes of proteins, using this signal actually resulted in a deteriorated prediction. Thorough benchmarking using other datasets from literature showed that PIER yielded improved performance as compared with several alignment-free or alignment-dependent predictions. The accuracy, efficiency, and dependence on structure alone make PIER a suitable tool for automated high-throughput annotation of protein structures emerging from structural proteomics projects.

Full-text preview

Available from:
  • Source
    • "An interface propensity is calculated for each feature. The combined score is the product of propensity scores from different properties, which is further smoothed by considering structural neighbors PIER [41] Structure PIER/ PIER predicts each surface patch as interfacial or not, using PLS (partial least squares) regression on the solvent accessibility values of 12 significantly over-and under-represented atomic groups at the interface Cons-PPISP [7] Structure "
    [Show abstract] [Hide abstract]
    ABSTRACT: Reliably pinpointing which specific amino acid residues form the interface(s) between a protein and its binding partner(s) is critical for understanding the structural and physicochemical determinants of protein recognition and binding affinity, and has wide applications in modeling and validating protein interactions predicted by high-throughput methods, in engineering proteins, and in prioritizing drug targets. Here, we review the basic concepts, principles and recent advances in computational approaches to the analysis and prediction of protein-protein interfaces. We point out caveats for objectively evaluating interface predictors, and discuss various applications of data-driven interface predictors for improving energy model-driven protein-protein docking. Finally, we stress the importance of exploiting binding partner information in reliably predicting interfaces and highlight recent advances in this emerging direction.
    FEBS letters 10/2015; DOI:10.1016/j.febslet.2015.10.003 · 3.17 Impact Factor
  • Source
    • "Prediction of possible residues of interfaces in each structure was performed by using the CPORT (Concensus Prediction of Interface Residues in Transient) facility (De Vries and Bonvin, 2011). PINuP (Liang et al., 2006), PIER (Kufareva et al., 2007), WHISCY (De Vries et al., 2006), ProMate (Neuvirth et al., 2004), SPPIDER (Porollo and Meller, 2007) and cons-PPISP (Chen and Zhou, 2005) are six interface residues prediction algorithms, which are cumulatively included in CPORT and provide reliable prediction of the interface residues, which can be integrated into the HADDOCK web server as active and passive site residues. The Visual Molecular Dynamics (VMD) software (Humphrey et al., 1996) was used for the protonation and partial charge assignment of the structures, whereas the Molecular Operating Environment (MOE) software (Chemical computing groups, MOE Software, version 2013) (Inc.) was used for calculation of electrostatic charges. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Dengue virus (DENV) belongs to the family Flaviviridae and can cause major health problems worldwide, including dengue fever and dengue shock syndrome. DENV replicon in human cells inhibits interferon α and β with the help of its non-structural proteins. Non-structural protein 5 (NS5) of DENV is responsible for the proteasome-mediated degradation of signal transducer and activator of transcription (STAT) 2 protein, which has been implicated in the development of resistance against interferon-mediated antiviral effect. This degradation of STAT2 primarily occurs with the help of E3 ubiquitin ligases. Seven in absentia homologue (SIAH) 2 is a host protein that can mediate the ubiquitination of proteins and is known for its interaction with NS5. In this study, comprehensive computational analysis was performed to characterize the protein-protein interactions between NS5, SIAH2, and STAT2 to gain insight into the residues and sites of interaction between these proteins. The objective of the study was to structurally characterize the NS5-STAT2, SIAH2-STAT2, and NS5-SIAH2 interactions along with the determination of the possible reaction pattern for the degradation of STAT2. Docking and physicochemical studies indicated that DENV NS5 may first interact with the host SIAH2, which can then proceed towards binding with STAT2 from the side of SIAH2. These implications are reported for the first time and require validation by wet-lab studies.
    Genetics and molecular research: GMR 04/2015; 14(2):4215-4237. DOI:10.4238/2015.April.28.4 · 0.78 Impact Factor
  • Source
    • "This was the only protein that was removed by hand after discovering its unusual number of interacting residues and highly unusual shape. Testing on the worst 10 proteins that were removed, Promate [18], cons-PPISP [51], PINUP [52], and PIER [28] achieved average MCC scores of.007,.050,.042, and.11 respectively (data not shown), substantially lower than their scores on the full test set (Section ‘A new MLPIP: RAD-T (Residues on Alternating Decision-Trees)’), suggesting this is not a weakness of the predictor used, but a property of the proteins themselves. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Transient protein-protein interactions (PPIs), which underly most biological processes, are a prime target for therapeutic development. Immense progress has been made towards computational prediction of PPIs using methods such as protein docking and sequence analysis. However, docking generally requires high resolution structures of both of the binding partners and sequence analysis requires that a significant number of recurrent patterns exist for the identification of a potential binding site. Researchers have turned to machine learning to overcome some of the other methods' restrictions by generalising interface sites with sets of descriptive features. Best practices for dataset generation, features, and learning algorithms have not yet been identified or agreed upon, and an analysis of the overall efficacy of machine learning based PPI predictors is due, in order to highlight potential areas for improvement. The presence of unknown interaction sites as a result of limited knowledge about protein interactions in the testing set dramatically reduces prediction accuracy. Greater accuracy in labelling the data by enforcing higher interface site rates per domain resulted in an average 44\% improvement across multiple machine learning algorithms. A set of 10 biologically unrelated proteins that were consistently predicted on with high accuracy emerged through our analysis. We identify seven features with the most predictive power over multiple datasets and machine learning algorithms. Through our analysis, we created a new predictor, RAD-T, that outperforms existing non-structurally specializing machine learning protein interface predictors, with an average 59\% increase in MCC score on a dataset with a high number of interactions. Current methods of evaluating machine-learning based PPI predictors tend to undervalue their performance, which may be artificially decreased by the presence of un-identified interaction sites. Changes to predictors' training sets will be integral to the future progress of interface prediction by machine learning methods. We reveal the need for a larger test set of well studied proteins or domain-specific scoring algorithms to compensate for poor interaction site identification on proteins in general.
    BMC Bioinformatics 03/2014; 15(1):82. DOI:10.1186/1471-2105-15-82 · 2.58 Impact Factor
Show more