Prediction of FAD interacting residues in a protein from its primary sequence using evolutionary information

Institute of Microbial Technology, Sector 39A, Chandigarh, India.
BMC Bioinformatics (Impact Factor: 2.58). 01/2010; 11 Suppl 1(Suppl 1):S48. DOI: 10.1186/1471-2105-11-S1-S48
Source: DOAJ

ABSTRACT Flavin binding proteins (FBP) plays a critical role in several biological functions such as electron transport system (ETS). These flavoproteins contain very tightly bound, sometimes covalently, flavin adenine dinucleotide (FAD) or flavin mono nucleotide (FMN). The interaction between flavin nucleotide and amino acids of flavoprotein is essential for their functionality. Thus identification of FAD interacting residues in a FBP is an important step for understanding their function and mechanism.
In this study, we describe models developed for predicting FAD interacting residues using 15, 17 and 19 window pattern. Support vector machine (SVM) based models have been developed using binary pattern of amino acid sequence of protein and achieved maximum accuracy 69.65% with Mathew's Correlation Coefficient (MCC) 0.39 and Area Under Curve (AUC) 0.773. The performance of these models have been improved significantly from 69.65% to 82.86% with MCC 0.66 and AUC 0.904, when evolutionary information is used as input in SVM. The evolutionary information was generated in form of position specific score matrix (PSSM) profile by using PSI-BLAST at e-value 0.001. All models were developed on 198 non-redundant FAD binding protein chains containing 5172 FAD interacting residues and evaluated using fivefold cross-validation technique.
This study suggests that evolutionary information of 17 amino acid patterns perform best for FAD interacting residues prediction. We also developed a web server which predicts FAD interacting residues in a protein which is freely available for academics.

Download full-text


Available from: Gajendra Pal Singh Raghava, Sep 28, 2015
1 Follower
26 Reads
    • "A computational method for predicting the FMN-binding residues on proteins would greatly facilitate defining FMNbinding sites on protein structures. Computational methods have been developed to predict FMN (Wang et al., 2012), flavin adenine dinucleotide (FAD) (Mishra and Raghava, 2010) and nicotinamide adenine dinucleotide (NAD) (Ansari and Raghava, 2010) binding Contents lists available at ScienceDirect journal homepage: "
    [Show abstract] [Hide abstract]
    ABSTRACT: Flavin mono-nucleotide (FMN) is a cofactor which is involved in many biological reactions. The insights on protein-FMN interactions aid the protein functional annotation and also facilitate in drug design. In this study, we have established a new method, making use of an encoding scheme of the three-dimensional probability density maps that describe the distributions of 40 non-covalent interacting atom types around protein surfaces, to predict FMN-binding sites on protein surfaces. One machine learning model was trained for each of the 30 protein atom types to predict tentative FMN-binding sites on protein structures. The method's capability was evaluated by five-fold cross validation on a dataset containing 81 non-redundant FMN-binding protein structures and further tested on independent datasets of 30 and 15 non-redundant protein structures respectively. These predictions achieved an accuracy of 0.94, 0.94 and 0.96 with Matthews correlation coefficient (MCC) of 0.53, 0.53 and 0.65 respectively for the three protein structure sets. The prediction capability is superior to the existing method. This is the first structure-based approach that does not rely on evolutionary information for predicting FMN-interacting residues. The webserver for the prediction is available at
    Journal of Theoretical Biology 11/2013; 192. DOI:10.1016/j.jtbi.2013.10.020 · 2.12 Impact Factor
  • Source
    • "Several prediction methods have been developed for carbohydrates [32,33], lipids [34,35], DNA [29,36-39] and RNA [30,38,40] interacting residues in protein sequence. Some methods have been developed for specific ligands such as ATP [41,42], GTP [43], NAD [44], FAD [45] and mannose [46]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background The vitamins are important cofactors in various enzymatic-reactions. In past, many inhibitors have been designed against vitamin binding pockets in order to inhibit vitamin-protein interactions. Thus, it is important to identify vitamin interacting residues in a protein. It is possible to detect vitamin-binding pockets on a protein, if its tertiary structure is known. Unfortunately tertiary structures of limited proteins are available. Therefore, it is important to develop in-silico models for predicting vitamin interacting residues in protein from its primary structure. Results In this study, first we compared protein-interacting residues of vitamins with other ligands using Two Sample Logo (TSL). It was observed that ATP, GTP, NAD, FAD and mannose preferred {G,R,K,S,H}, {G,K,T,S,D,N}, {T,G,Y}, {G,Y,W} and {Y,D,W,N,E} residues respectively, whereas vitamins preferred {Y,F,S,W,T,G,H} residues for the interaction with proteins. Furthermore, compositional information of preferred and non-preferred residues along with patterns-specificity was also observed within different vitamin-classes. Vitamins A, B and B6 preferred {F,I,W,Y,L,V}, {S,Y,G,T,H,W,N,E} and {S,T,G,H,Y,N} interacting residues respectively. It suggested that protein-binding patterns of vitamins are different from other ligands, and motivated us to develop separate predictor for vitamins and their sub-classes. The four different prediction modules, (i) vitamin interacting residues (VIRs), (ii) vitamin-A interacting residues (VAIRs), (iii) vitamin-B interacting residues (VBIRs) and (iv) pyridoxal-5-phosphate (vitamin B6) interacting residues (PLPIRs) have been developed. We applied various classifiers of SVM, BayesNet, NaiveBayes, ComplementNaiveBayes, NaiveBayesMultinomial, RandomForest and IBk etc., as machine learning techniques, using binary and Position-Specific Scoring Matrix (PSSM) features of protein sequences. Finally, we selected best performing SVM modules and obtained highest MCC of 0.53, 0.48, 0.61, 0.81 for VIRs, VAIRs, VBIRs, PLPIRs respectively, using PSSM-based evolutionary information. All the modules developed in this study have been trained and tested on non-redundant datasets and evaluated using five-fold cross-validation technique. The performances were also evaluated on the balanced and different independent datasets. Conclusions This study demonstrates that it is possible to predict VIRs, VAIRs, VBIRs and PLPIRs from evolutionary information of protein sequence. In order to provide service to the scientific community, we have developed web-server and standalone software VitaPred (
    BMC Bioinformatics 02/2013; 14(1):44. DOI:10.1186/1471-2105-14-44 · 2.58 Impact Factor
  • Source
    • "Following previous studies[3,13-16], we used the Ligand Protein Contact server to assign heme binding and nonbinding residues for the protein chains in the datasets. Among the training set of PHeme-75, we obtained 18584 residues with atomic coordinates, of which about 13.5% are heme binding sites. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The heme-protein interactions are essential for various biological processes such as electron transfer, catalysis, signal transduction and the control of gene expression. The knowledge of heme binding residues can provide crucial clues to understand these activities and aid in functional annotation, however, insufficient work has been done on the research of heme binding residues from protein sequence information. We propose a sequence-based approach for accurate prediction of heme binding residues by a novel integrative sequence profile coupling position specific scoring matrices with heme specific physicochemical properties. In order to select the informative physicochemical properties, we design an intuitive feature selection scheme by combining a greedy strategy with correlation analysis. Our integrative sequence profile approach for prediction of heme binding residues outperforms the conventional methods using amino acid and evolutionary information on the 5-fold cross validation and the independent tests. The novel feature of an integrative sequence profile achieves good performance using a reduced set of feature vector elements.
    Proteome Science 06/2012; 10 Suppl 1(Suppl 1):S20. DOI:10.1186/1477-5956-10-S1-S20 · 1.73 Impact Factor
Show more