-
[show abstract]
[hide abstract]
ABSTRACT: It is important to understand the cause of amyloid illnesses by predicting the short protein fragments capable of forming amyloid-like fibril motifs aiding in the discovery of sequence-targeted anti-aggregation drugs. It is extremely desirable to design computational tools to provide affordable in silico predictions owing to the limitations of molecular techniques for their identification. In this research article, we tried to study, from a machine learning perspective, the performance of several machine learning classifiers that use heterogenous features based on biochemical and biophysical properties of amino acids to discriminate between amyloidogenic and non-amyloidogenic regions in peptides. Four conventional machine learning classifiers namely Support Vector Machine, Neural network, Decision tree and Random forest were trained and tested to find the best classifier that fits the problem domain well. Prior to classification, novel implementations of two biologically-inspired feature optimization techniques based on evolutionary algorithms and methodologies that mimic social life and a multivariate method based on projection are utilized in order to remove the unimportant and uninformative features. Among the dimenionality reduction algorithms considered under the study, prediction results show that algorithms based on evolutionary computation is the most effective. SVM best suits the problem domain in its fitment among the classifiers considered. The best classifier is also compared with an online predictor to evidence the equilibrium maintained between true positive rates and false positive rates in the proposed classifier. This exploratory study suggests that these methods are promising in providing amyloidogenity prediction and may be further extended for large-scale proteomic studies.
Protein and Peptide Letters 04/2012; 19(9):917-23. · 1.94 Impact Factor
-
Indo – US Workshop on Biocomputing (ISB 2011), NIT Calicut, Kerala; 09/2011
-
[show abstract]
[hide abstract]
ABSTRACT: Identifying amyloidogenic regions in protein sequences is useful in understanding the underlying cause of several human diseases and finding potential therapeutic targets. Given the laborious nature of experimental validation of segments most prone to form fibrils, it was essential that computational approaches be developed that could produce reliable, affordable and testable in silico predictions. In this paper, we present and assess some of the recently developed computational tools for predicting amyloid fibril forming motifs that remain as one of the key means used to decipher the role of such regions in disease diagnosis, prognosis and drug discovery.
International Journal of Computer Applications. 02/2011; 4(5):155-157.
-
[show abstract]
[hide abstract]
ABSTRACT: Amyloidogenic regions in polypeptide chains are associated with a number of diseases. Experimental evidence is compelling in favor of the hypothesis that small segments of proteins are responsible for its amyloidogenic behavior. Thus, identifying these short peptides is critical for understanding diseases associated with protein misfolding and developing sequencetargeted anti-aggregation drugs. The in silico approaches using phenomenological models based on bio-physio-chemical properties of amino acids suffer from “curse of dimensionality”. Therefore, before adopting standard classification algorithms to predict such fibril motifs, the “curse of dimensionality” needs to be solved. The present study evaluates the performance of feature selection algorithms namely filter, wrapper and embedded models in conjunction with Support Vector Machine classifier. We also propose a novel integrated feature selection strategy based on Genetic Algorithm and Support Vector Machine to get an optimal number of features in predicting the amyloid fibril-forming short stretches of peptides. In addition, we investigated the performances of feature selection models that resulted in new and complementary set of properties and concludes that the proposed integrated dimensionality reduction technique outperforms all other methods and achieves the highest sensitivity and specificity of 86% and 82% respectively.
International Journal of Computer Applications. 10/2010; 8(2).
-
[show abstract]
[hide abstract]
ABSTRACT: Amyloidogenic regions in polypeptide chains are associated with a number of pathologies including neurodegenerative diseases. Recent studies have shown that small regions of proteins are responsible for its amyloidogenic behavior. Therefore, identifying these short peptides is critical for understanding diseases associated with protein aggregation. Owing to the limitations of molecular techniques for the identification of fibril forming targets, it became apparent that clever computational techniques might enable their discovery in silico. We propose a machine learning based method to predict the amyloid fibril-forming short stretches of peptides using Support Vector Machine. The features of this method are based on the physicochemical properties of amino acids. Inorder to get an optimal number of properties, a feature selection approach based on Genetic Algorithm is PErformed. The presented algorithm achieved a balanced prediction performance in terms of true positive and false positive rates in predicting a peptide status: amyloidogenic or non-amyloidogenic, which is not reflected in the existing methods.
9th Annual International Conference on Computational Systems Bioinformatics (ISB 2010), Stanford University; 08/2010
-
1st International Conference on Bioinformatics and Systems Biology (ICBSB 2010), Annamalai University; 02/2010