Protein interaction prediction for mouse pdz domains using dipeptide composition features
ABSTRACT The PDZ domain is one of the largest families of protein domains that are involved in targeting and routing specific proteins in signaling pathways. PDZ domains mediate protein-protein interactions by binding the C-terminal peptides of their target proteins. Using the dipeptide feature encoding, we develop a PDZ domain interaction predictor using a support vector machine that achieves a high accuracy rate of 82.49%. Since most of the dipeptide compositions are redundant and irrelevant, we propose a new hybrid feature selection technique to select only a subset of these compositions that are useful for interaction prediction. Our experimental results show that only approximately 25% of dipeptide features are needed and that our method increases the accuracy by 3%. The selected dipeptide features are analyzed and shown to have important roles on specificity pattern of PDZ domains.
- Nature Biotechnology 11/2008; 26(10):1193. DOI:10.1038/nbt1008-1193c · 39.08 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Protein-protein interactions (PPIs) are central to most biological processes. Although efforts have been devoted to the development of methodology for predicting PPIs and protein interaction networks, the application of most existing methods is limited because they need information about protein homology or the interaction marks of the protein partners. In the present work, we propose a method for PPI prediction using only the information of protein sequences. This method was developed based on a learning algorithm-support vector machine combined with a kernel function and a conjoint triad feature for describing amino acids. More than 16,000 diverse PPI pairs were used to construct the universal model. The prediction ability of our approach is better than that of other sequence-based PPI prediction methods because it is able to predict PPI networks. Different types of PPI networks have been effectively mapped with our method, suggesting that, even with only sequence information, this method could be applied to the exploration of networks for any newly discovered protein with unknown biological relativity. In addition, such supplementary experimental information can enhance the prediction ability of the method.Proceedings of the National Academy of Sciences 04/2007; 104(11):4337-41. DOI:10.1073/pnas.0607879104 · 9.81 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Based on pseudo amino acid (PseAA) composition and a novel hybrid feature selection frame, this paper presents a computational system to predict the PPIs (protein-protein interactions) using 8796 protein pairs. These pairs are coded by PseAA composition, resulting in 114 features. A hybrid feature selection system, mRMR-KNNs-wrapper, is applied to obtain an optimized feature set by excluding poor-performed and/or redundant features, resulting in 103 remaining features. Using the optimized 103-feature subset, a prediction model is trained and tested in the k-nearest neighbors (KNNs) learning system. This prediction model achieves an overall accurate prediction rate of 76.18%, evaluated by 10-fold cross-validation test, which is 1.46% higher than using the initial 114 features and is 6.51% higher than the 20 features, coded by amino acid compositions. The PPIs predictor, developed for this research, is available for public use at http://chemdata.shu.edu.cn/ppi.Biochemical and Biophysical Research Communications 02/2009; 380(2):318-22. DOI:10.1016/j.bbrc.2009.01.077 · 2.28 Impact Factor