Article

A similarity distance of diversity measure for discriminating mesophilic and thermophilic proteins.

School of Physical Science and Technology, Inner Mongolia University, Hohhot, 010021, China, .
Amino Acids (impact factor: 3.25). 08/2012; DOI:10.1007/s00726-012-1374-z
Source: PubMed

ABSTRACT The successful prediction of thermophilic proteins is useful for designing stable enzymes that are functional at high temperature. We have used the increment of diversity (ID), a novel amino acid composition-based similarity distance, in a 2-class K-nearest neighbor classifier to classify thermophilic and mesophilic proteins. And the KNN-ID classifier was successfully developed to predict the thermophilic proteins. Instead of extracting features from protein sequences as done previously, our approach was based on a diversity measure of symbol sequences. The similarity distance between each pair of protein sequences was first calculated to quantitatively measure the similarity level of one given sequence and the other. The query protein is then determined using the K-nearest neighbor algorithm. Comparisons with multiple recently published methods showed that the KNN-ID proposed in this study outperforms the other methods. The improved predictive performance indicated it is a simple and effective classifier for discriminating thermophilic and mesophilic proteins. At last, the influence of protein length and protein identity on prediction accuracy was discussed further. The prediction model and dataset used in this article can be freely downloaded from http://wlxy.imu.edu.cn/college/biostation/fuwu/KNN-ID/index.htm .

0 0
 · 
0 Bookmarks
 · 
40 Views

Keywords

2-class K-nearest neighbor classifier
 
classify thermophilic
 
given sequence
 
improved predictive performance
 
K-nearest neighbor algorithm
 
KNN-ID classifier
 
mesophilic proteins
 
novel amino acid composition-based similarity distance
 
protein identity
 
protein length
 
protein sequences
 
quantitatively measure
 
query protein
 
similarity distance
 
similarity level
 
stable enzymes
 
study outperforms
 
successful prediction
 
symbol sequences
 
thermophilic proteins