Jing Hu

Franklin and Marshall College, Lancaster, PA, USA

Are you Jing Hu?

Claim your profile

Publications (10)11.29 Total impact

  • Article: A comparative analysis of protein interfaces.
    Jing Hu, Changhui Yan
    [show abstract] [hide abstract]
    ABSTRACT: Proteins perform various functions through interacting with other molecules. Analyzing the characteristics of residues on the interaction interfaces provides insights into the mechanisms of these interactions. In this study, we analyze the characteristics of five different interfaces: protein-protein interfaces, protein-DNA interfaces, protein-RNA interfaces, protein-carbohydrate interfaces, and protein-ligand interfaces. The analysis reveals that these interfaces are different in residue composition. These differences in residue composition reflect the differences in the mechanisms that facility different types of interactions. Regardless of the differences in residue composition, all of the five types of interfaces are more conservative than the non-interface protein surfaces. Additionally, our results also show that it is important to consider the effect of solvent accessibility when investigating residues' propensities for different parts of the proteins.
    Protein and Peptide Letters 11/2010; 17(11):1450-8. · 1.94 Impact Factor
  • Source
    Article: A tool for calculating binding-site residues on proteins from PDB structures.
    Jing Hu, Changhui Yan
    [show abstract] [hide abstract]
    ABSTRACT: In the research on protein functional sites, researchers often need to identify binding-site residues on a protein. A commonly used strategy is to find a complex structure from the Protein Data Bank (PDB) that consists of the protein of interest and its interacting partner(s) and calculate binding-site residues based on the complex structure. However, since a protein may participate in multiple interactions, the binding-site residues calculated based on one complex structure usually do not reveal all binding sites on a protein. Thus, this requires researchers to find all PDB complexes that contain the protein of interest and combine the binding-site information gleaned from them. This process is very time-consuming. Especially, combing binding-site information obtained from different PDB structures requires tedious work to align protein sequences. The process becomes overwhelmingly difficult when researchers have a large set of proteins to analyze, which is usually the case in practice. In this study, we have developed a tool for calculating binding-site residues on proteins, TCBRP http://yanbioinformatics.cs.usu.edu:8080/ppbindingsubmit. For an input protein, TCBRP can quickly find all binding-site residues on the protein by automatically combining the information obtained from all PDB structures that consist of the protein of interest. Additionally, TCBRP presents the binding-site residues in different categories according to the interaction type. TCBRP also allows researchers to set the definition of binding-site residues. The developed tool is very useful for the research on protein binding site analysis and prediction.
    BMC Structural Biology 09/2009; 9:52. · 2.48 Impact Factor
  • Article: A method for discovering transmembrane beta-barrel proteins in Gram-negative bacterial proteomes.
    Jing Hu, Changhui Yan
    [show abstract] [hide abstract]
    ABSTRACT: Transmembrane beta-barrel (TMB) proteins play pivotal roles in many aspects of bacterial functions. This paper presents a k-nearest neighbor (K-NN) method for discriminating TMB and non-TMB proteins. We start with a method that makes predictions based on a distance computed from residue composition and gradually improve the prediction performance by including homologous sequences and searching for a set of residues and di-peptides for calculating the distance. The final method achieves an accuracy of 97.1%, with 0.876 MCC, 86.4% sensitivity and 98.8% specificity. A web server based on the proposed method is available at http://yanbioinformatics.cs.usu.edu:8080/TMBKNNsubmit.
    Computational biology and chemistry 09/2008; 32(4):298-301. · 1.37 Impact Factor
  • Source
    Article: Identification of deleterious non-synonymous single nucleotide polymorphisms using sequence-derived information.
    Jing Hu, Changhui Yan
    [show abstract] [hide abstract]
    ABSTRACT: As the number of non-synonymous single nucleotide polymorphisms (nsSNPs), also known as single amino acid polymorphisms (SAPs), increases rapidly, computational methods that can distinguish disease-causing SAPs from neutral SAPs are needed. Many methods have been developed to distinguish disease-causing SAPs based on both structural and sequence features of the mutation point. One limitation of these methods is that they are not applicable to the cases where protein structures are not available. In this study, we explore the feasibility of classifying SAPs into disease-causing and neutral mutations using only information derived from protein sequence. We compiled a set of 686 features that were derived from protein sequence. For each feature, the distance between the wild-type residue and mutant-type residue was computed. Then a greedy approach was used to select the features that were useful for the classification of SAPs. 10 features were selected. Using the selected features, a decision tree method can achieve 82.6% overall accuracy with 0.607 Matthews Correlation Coefficient (MCC) in cross-validation. When tested on an independent set that was not seen by the method during the training and feature selection, the decision tree method achieves 82.6% overall accuracy with 0.604 MCC. We also evaluated the proposed method on all SAPs obtained from the Swiss-Prot, the method achieves 0.42 MCC with 73.2% overall accuracy. This method allows users to make reliable predictions when protein structures are not available. Different from previous studies, in which only a small set of features were arbitrarily chosen and considered, here we used an automated method to systematically discover useful features from a large set of features well-annotated in public databases. The proposed method is a useful tool for the classification of SAPs, especially, when the structure of the protein is not available.
    BMC Bioinformatics 02/2008; 9:297. · 2.75 Impact Factor
  • Source
    Article: Discrimination of outer membrane proteins with improved performance.
    [show abstract] [hide abstract]
    ABSTRACT: Outer membrane proteins (OMPs) perform diverse functional roles in Gram-negative bacteria. Identification of outer membrane proteins is an important task. This paper presents a method for distinguishing outer membrane proteins (OMPs) from non-OMPs (that is, globular proteins and inner membrane proteins (IMPs)). First, we calculated the average residue compositions of OMPs, globular proteins and IMPs separately using a training set. Then for each protein from the test set, its distances to the three groups were calculated based on residue composition using a weighted Euclidean distance (WED) approach. Proteins from the test set were classified into OMP versus non-OMP classes based on the least distance. The proposed method can distinguish between OMPs and non-OMPs with 91.0% accuracy and 0.639 Matthews correlation coefficient (MCC). We then improved the method by including homologous sequences into the calculation of residue composition and using a feature-selection method to select the single residue and di-peptides that were useful for OMP prediction. The final method achieves an accuracy of 96.8% with 0.859 MCC. In direct comparisons, the proposed method outperforms previously published methods. The proposed method can identify OMPs with improved performance. It will be very helpful to the discovery of OMPs in a genome scale.
    BMC Bioinformatics 02/2008; 9:47. · 2.75 Impact Factor
  • Article: Protein subcelluar localisation prediction with improved performance.
    Jing Hu, Changhui Yan
    I. J. Functional Informatics and Personalised Medicine. 01/2008; 1:321-328.
  • Conference Proceeding: Mining sequence features for DNA-binding site prediction.
    Jing Hu, Changhui Yan
    Proceedings of the 2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2008, Sun Valley Resort, Sun Valley, Idaho, September 15-17, 2008; 01/2008
  • Source
    Article: HMM_RA: an improved method for alpha-helical transmembrane protein topology prediction.
    Jing Hu, Changhui Yan
    [show abstract] [hide abstract]
    ABSTRACT: alpha-helical transmembrane (TM) proteins play important and diverse functional roles in cells. The ability to predict the topology of these proteins is important for identifying functional sites and inferring function of membrane proteins. This paper presents a Hidden Markov Model (referred to as HMM_RA) that can predict the topology of alpha-helical transmembrane proteins with improved performance. HMM_RA adopts the same structure as the HMMTOP method, which has five modules: inside loop, inside helix tail, membrane helix, outside helix tail and outside loop. Each module consists of one or multiple states. HMM_RA allows using reduced alphabets to encode protein sequences. Thus, each state of HMM_RA is associated with n emission probabilities, where n is the size of the reduced alphabet set. Direct comparisons using two standard data sets show that HMM_RA consistently outperforms HMMTOP and TMHMM in topology prediction. Specifically, on a high-quality data set of 83 proteins, HMM_RA outperforms HMMTOP by up to 7.6% in topology accuracy and 6.4% in alpha-helices location accuracy. On the same data set, HMM_RA outperforms TMHMM by up to 6.4% in topology accuracy and 2.9% in location accuracy. Comparison also shows that HMM_RA achieves comparable performance as Phobius, a recently published method.
    Bioinformatics and biology insights 01/2008; 2:67-74.
  • Conference Proceeding: Predicting Protein Subcelluar Localizations Using Weighted Euclidian Distance.
    Jing Hu, Changhui Yan
    Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering, BIBE 2007, October 14-17, 2007, Harvard Medical School, Boston, MA, USA; 01/2007
  • Conference Proceeding: A Hidden Markov Model Approach to Identifying HTH Motifs Using Protein Sequence and Predicted Solvent Accessibility.
    Changhui Yan, Jing Hu
    Proceedings of the 2006 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB 2006, Renaissance Hotel Downtown, Toronto, Ontario, Canada, September 28-29, 2006; 01/2006

Institutions

  • 2010
    • Franklin and Marshall College
      Lancaster, PA, USA
  • 2008–2009
    • Utah State University
      • Department of Computer Science
      Logan, OH, USA