Rong Liu

Huazhong Agricultural University, Wu-han-shih, Hubei, China

Are you Rong Liu?

Claim your profile

Publications (4)12.92 Total impact

  • Rong Liu, Jianjun Hu
    [Show abstract] [Hide abstract]
    ABSTRACT: Accurate prediction of DNA-binding residues has become a problem of increasing importance in structural bioinformatics. Here we presented DNABind, a novel hybrid algorithm for identifying these crucial residues by exploiting the complementarity between machine learning and template-based methods. Our machine learning-based method was based on the probabilistic combination of a structure-based and a sequence-based predictor, both of which were implemented using Support Vector Machines algorithms. The former included our well-designed structural features, such as solvent accessibility, local geometry, topological features, and relative positions, which can effectively quantify the difference between DNA-binding and non-binding residues. The latter combined evolutionary conservation features with three other sequence attributes. Our template-based method depended on structural alignment and utilized the template structure from known protein-DNA complexes to infer DNA-binding residues. We showed that the template method had excellent performance when reliable templates were found for the query proteins, but tended to be strongly influenced by the template quality as well as the conformational changes upon DNA binding. In contrast, the machine learning approach yielded better performance when high quality templates were not available (about 1/3 cases in our dataset) or the query protein was subject to intensive transformation changes upon DNA binding. Our extensive experiments indicated that the hybrid approach can distinctly improve the performance of the individual methods for both bound and unbound structures. DNABind also significantly outperformed the-state-of-art algorithms by around 10% in terms of Matthews's correlation coefficient. The proposed methodology could also have wide application in various protein functional site annotations. DNABind is freely available at http://mleg.cse.sc.edu/DNABind/. © Proteins 2013;. © 2013 Wiley Periodicals, Inc.
    Proteins Structure Function and Bioinformatics 06/2013; · 3.34 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Computational prediction of protein subcellular localization can greatly help to elucidate its functions. Despite the existence of dozens of protein localization prediction algorithms, the prediction accuracy and coverage are still low. Several ensemble algorithms have been proposed to improve the prediction performance, which usually include as many as 10 or more individual localization algorithms. However, their performance is still limited by the running complexity and redundancy among individual prediction algorithms. This paper proposed a novel method for rational design of minimalist ensemble algorithms for practical genome-wide protein subcellular localization prediction. The algorithm is based on combining a feature selection based filter and a logistic regression classifier. Using a novel concept of contribution scores, we analyzed issues of algorithm redundancy, consensus mistakes, and algorithm complementarity in designing ensemble algorithms. We applied the proposed minimalist logistic regression (LR) ensemble algorithm to two genome-wide datasets of Yeast and Human and compared its performance with current ensemble algorithms. Experimental results showed that the minimalist ensemble algorithm can achieve high prediction accuracy with only 1/3 to 1/2 of individual predictors of current ensemble algorithms, which greatly reduces computational complexity and running time. It was found that the high performance ensemble algorithms are usually composed of the predictors that together cover most of available features. Compared to the best individual predictor, our ensemble algorithm improved the prediction accuracy from AUC score of 0.558 to 0.707 for the Yeast dataset and from 0.628 to 0.646 for the Human dataset. Compared with popular weighted voting based ensemble algorithms, our classifier-based ensemble algorithms achieved much better performance without suffering from inclusion of too many individual predictors. We proposed a method for rational design of minimalist ensemble algorithms using feature selection and classifiers. The proposed minimalist ensemble algorithm based on logistic regression can achieve equal or better prediction performance while using only half or one-third of individual predictors compared to other ensemble algorithms. The results also suggested that meta-predictors that take advantage of a variety of features by combining individual predictors tend to achieve the best performance. The LR ensemble server and related benchmark datasets are available at http://mleg.cse.sc.edu/LRensemble/cgi-bin/predict.cgi.
    BMC Bioinformatics 07/2012; 13:157. · 3.02 Impact Factor
  • Source
    Rong Liu, Jianjun Hu
    [Show abstract] [Hide abstract]
    ABSTRACT: Accurate prediction of binding residues involved in the interactions between proteins and small ligands is one of the major challenges in structural bioinformatics. Heme is an essential and commonly used ligand that plays critical roles in electron transfer, catalysis, signal transduction and gene expression. Although much effort has been devoted to the development of various generic algorithms for ligand binding site prediction over the last decade, no algorithm has been specifically designed to complement experimental techniques for identification of heme binding residues. Consequently, an urgent need is to develop a computational method for recognizing these important residues. Here we introduced an efficient algorithm HemeBIND for predicting heme binding residues by integrating structural and sequence information. We systematically investigated the characteristics of binding interfaces based on a non-redundant dataset of heme-protein complexes. It was found that several sequence and structural attributes such as evolutionary conservation, solvent accessibility, depth and protrusion clearly illustrate the differences between heme binding and non-binding residues. These features can then be separately used or combined to build the structure-based classifiers using support vector machine (SVM). The results showed that the information contained in these features is largely complementary and their combination achieved the best performance. To further improve the performance, an attempt has been made to develop a post-processing procedure to reduce the number of false positives. In addition, we built a sequence-based classifier based on SVM and sequence profile as an alternative when only sequence information can be used. Finally, we employed a voting method to combine the outputs of structure-based and sequence-based classifiers, which demonstrated remarkably better performance than the individual classifier alone. HemeBIND is the first specialized algorithm used to predict binding residues in protein structures for heme ligands. Extensive experiments indicated that both the structure-based and sequence-based methods have effectively identified heme binding residues while the complementary relationship between them can result in a significant improvement in prediction performance. The value of our method is highlighted through the development of HemeBIND web server that is freely accessible at http://mleg.cse.sc.edu/hemeBIND/.
    BMC Bioinformatics 05/2011; 12:207. · 3.02 Impact Factor
  • Source
    Rong Liu, Jianjun Hu
    [Show abstract] [Hide abstract]
    ABSTRACT: Computational identification of heme-binding residues is beneficial for predicting and designing novel heme proteins. Here we proposed a novel method for heme-binding residue prediction by exploiting topological properties of these residues in the residue interaction networks derived from three-dimensional structures. Comprehensive analysis showed that key residues located in heme-binding regions are generally associated with the nodes with higher degree, closeness and betweenness, but lower clustering coefficient in the network. HemeNet, a support vector machine (SVM) based predictor, was developed to identify heme-binding residues by combining topological features with existing sequence and structural features. The results showed that incorporation of network-based features significantly improved the prediction performance. We also compared the residue interaction networks of heme proteins before and after heme binding and found that the topological features can well characterize the heme-binding sites of apo structures as well as those of holo structures, which led to reliable performance improvement as we applied HemeNet to predicting the binding residues of proteins in the heme-free state. HemeNet web server is freely accessible at http://mleg.cse.sc.edu/hemeNet/.
    PLoS ONE 01/2011; 6(10):e25560. · 3.53 Impact Factor

Publication Stats

20 Citations
12.92 Total Impact Points

Institutions

  • 2013
    • Huazhong Agricultural University
      Wu-han-shih, Hubei, China
  • 2011–2013
    • University of South Carolina
      • Department of Computer Science & Engineering
      Columbia, South Carolina, United States