[Show abstract][Hide abstract] ABSTRACT: Prediction of subcellular localization of proteins is important for genome annotation, protein function prediction, and drug discovery. We present a prediction method for Gram-negative bacteria that uses ten one-versus-one support vector machine (SVM) classifiers, where compartment-specific biological features are selected as input to each SVM classifier. The final prediction of localization sites is determined by integrating the results from ten binary classifiers using a combination of majority votes and a probabilistic method. The overall accuracy reaches 91.4%, which is 1.6% better than the state-of-the-art system, in a ten-fold cross-validation evaluation on a benchmark data set. We demonstrate that feature selection guided by biological knowledge and insights in one-versus-one SVM classifiers can lead to a significant improvement in the prediction performance. Our model is also used to produce highly accurate prediction of 92.8% overall accuracy for proteins of dual localizations.
Computational systems bioinformatics / Life Sciences Society. Computational Systems Bioinformatics Conference 02/2006; DOI:10.1142/1860947573_0041
[Show abstract][Hide abstract] ABSTRACT: We present a novel method to address multi-labeled protein subcellular localization prediction in gram-negative bacteria using support vector machines (SVM) as classifiers. For a given protein sequence that may have more than one label, features are extracted from amino acid composition and molecular function related terms in gene ontology (GO) as input to SVM. We apply one-against-others SVM to proteins of gram-negative bacteria in a 5-fold cross-validation. The results of the multi-labeled predictions are evaluated based on two criteria: class number and class category. For the first criterion, our method predicts the number of classes (class number) for each protein at an accuracy rate of 94.1%. For the second criterion, we compare the categories of the actual classes with the predicted classes proportionate to ranks, and obtain an accuracy of 83.2%. Our method is the first approach to predict and evaluate multi-labeled protein subcellular localization for prokaryotic bacteria and we demonstrate that it has a good predictive power.
Computational Systems Bioinformatics Conference, 2005. Workshops and Poster Abstracts. IEEE; 09/2005