IRSpot-PseDNC: Identify recombination spots with pseudo dinucleotide composition

Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan, China, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, San Diego, CA, USA, School of Public Health, Hebei United University, Tangshan 063000, China and Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.
Nucleic Acids Research (Impact Factor: 9.11). 01/2013; 41(6). DOI: 10.1093/nar/gks1450
Source: PubMed


Meiotic recombination is an important biological process. As a main driving force of evolution, recombination provides natural new combinations of genetic variations. Rather than randomly occurring across a genome, meiotic recombination takes place in some genomic regions (the so-called 'hotspots') with higher frequencies, and in the other regions (the so-called 'coldspots') with lower frequencies. Therefore, the information of the hotspots and coldspots would provide useful insights for in-depth studying of the mechanism of recombination and the genome evolution process as well. So far, the recombination regions have been mainly determined by experiments, which are both expensive and time-consuming. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying the recombination regions. In this study, a predictor, called 'iRSpot-PseDNC', was developed for identifying the recombination hotspots and coldspots. In the new predictor, the samples of DNA sequences are formulated by a novel feature vector, the so-called 'pseudo dinucleotide composition' (PseDNC), into which six local DNA structural properties, i.e. three angular parameters (twist, tilt and roll) and three translational parameters (shift, slide and rise), are incorporated. It was observed by the rigorous jackknife test that the overall success rate achieved by iRSpot-PseDNC was >82% in identifying recombination spots in Saccharomyces cerevisiae, indicating the new predictor is promising or at least may become a complementary tool to the existing methods in this area. Although the benchmark data set used to train and test the current method was from S. cerevisiae, the basic approaches can also be extended to deal with all the other genomes. Particularly, it has not escaped our notice that the PseDNC approach can be also used to study many other DNA-related problems. As a user-friendly web-server, iRSpot-PseDNC is freely accessible at

Download full-text


Available from: Hao Lin, Jun 17, 2014
53 Reads
  • Source
    • "As demonstrated in a series of recent publications (see, e.g. Chen et al., 2013; Jia et al., 2015; Lin et al., 2014; Liu et al., 2015; Xu et al., 2013) and emphasized in a recent review (Chou, 2015), userfriendly and publicly accessible web-servers represent the future direction for developing practically more useful models, simulated methods, predictors, or demonstrating new and novel structures, and we shall make efforts in our future work to provide a webserver for the findings presented in this paper. Next steps in the structural studies of CIP2A would be to crystallize CIP2A-ArmRP in complex with a possible interaction partner to deduce the exact peptide-binding site and molecular interactions. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Cancerous Inhibitor of Protein Phosphatase 2A (CIP2A) is a human oncoprotein, which exerts its cancer-promoting function through interaction with other proteins, for example Protein Phosphatase 2A (PP2A) and MYC. The lack of structural information for CIP2A significantly prevents the design of anti-cancer therapeutics targeting this protein. In an attempt to counteract this fact, we modeled the three-dimensional structure of the N-terminal domain (CIP2A-ArmRP), analyzed key areas and amino acids, and coupled the results to the existing literature. The model reliably shows a stable armadillo repeat fold with a positively charged groove. The fact that this conserved groove highly likely binds peptides is corroborated by the presence of a conserved polar ladder, which is essential for the proper peptide-binding mode of armadillo repeat proteins and, according to our results, several known CIP2A interaction partners appropriately possess an ArmRP-binding consensus motif. Moreover, we show that Arg229Gln, which has been linked to the development of cancer, causes a significant change in charge and surface properties of CIP2A-ArmRP. In conclusion, our results reveal that CIP2A-ArmRP shares the typical fold, protein-protein interaction site and interaction patterns with other natural armadillo proteins and that, presumably, several interaction partners bind into the central groove of the modeled CIP2A-ArmRP. By providing essential structural characteristics of CIP2A, the present study significantly increases our knowledge on how CIP2A interacts with other proteins in cancer progression and how to develop new therapeutics targeting CIP2A.
    Journal of Theoretical Biology 09/2015; 386. DOI:10.1016/j.jtbi.2015.09.010 · 2.12 Impact Factor
  • Source
    • "We consider these properties owing to the observation that hotspots centers are characterized by a depletion of nucleosomes (Pan et al., 2011) and DNA flexibility plays an important role in nucleosome positioning (Richmond and Davey, 2003; Tolstorukov et al., 2007). The performance of the dinucleotide structure parameters in hot/cold spots prediction was previously demonstrated (Chen et al., 2013). The third one is thermodynamic properties including dinucleotide free energy, entropy and enthalpy (Ignatova et al., 2008). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Characterization and accurate prediction of recombination hotspots and coldspots have crucial implications for the mechanism of recombination. Several models have predicted recombination hot/cold spots successfully, but there is still much room for improvement. We present a novel classifier in which k-mer frequency, physical and thermodynamic properties of DNA sequences are incorporated in the form of weighted features. Applying the classifier to recombination hot/cold ORFs in Saccharomyces cerevisiae, we achieved an accuracy of 90%, which is ∼5% higher than existing methods, such as iRSpot-PseDNC, IDQD and Random Forest. The model also predicted non-ORF recombination hot/cold spots sequences in Saccharomyces cerevisiae with high accuracy. A broad applicability of the model in the field of classification is expected. Copyright © 2015. Published by Elsevier Ltd.
    Journal of Theoretical Biology 06/2015; 382. DOI:10.1016/j.jtbi.2015.06.030 · 2.12 Impact Factor
  • Source
    • "d their physical properties , secondary structure components and range of ASA values . Finally , we applied the predicted ASA values to improve the accuracy of the energy function , 3DIGARS , which actually resulted in outperforming all the state - of - the - art energy functions significantly . As demonstrated by a series of recent publications ( Chen et al . , 2013 ) ; ( Lin et al . , 2014 , 2015 ) ; ( Ding et al . , 2014 ) ; ( Xu et al . , 2014 ) ; ( Jia et al . , 2015 ) , to establish a really useful sequence - based statistical predictor for a biological system , we aligned the outline of our paper accordingly towards the steps of Chou ' s 5 - step rule ( Chou , 2011 ) for the two different par"
    [Show abstract] [Hide abstract]
    ABSTRACT: An accurate prediction of real value accessible surface area (ASA) from protein sequence alone has wide application in the field of bioinformatics and computational biology. ASA has been helpful in understanding the 3-dimensional structure and function of a protein, acting as high impact feature in secondary structure prediction, disorder prediction, binding region identification and fold recognition applications. To enhance and support broad applications of ASA, we have made an attempt to improve the prediction accuracy of absolute accessible surface area by developing a new predictor paradigm, namely REGAd3p, for real value prediction through classical Exact Regression with Regularization and polynomial kernel of degree 3 which was further optimized using Genetic Algorithm. ASA assisting effective energy function, motivated us to enhance the accuracy of predicted ASA for better energy function application. Our ASA prediction paradigm was trained and tested using a new benchmark dataset, proposed in this work, consisting of 1001 and 298 protein chains, respectively. We achieved maximum Pearson Correlation Coefficient (PCC) of 0.76 and 1.45% improved PCC when compared with existing top performing predictor, SPINE-X, in ASA prediction on independent test set. Furthermore, we modeled the error between actual and predicted ASA in terms of energy and combined this energy linearly with the energy function 3DIGARS which resulted in an effective energy function, namely 3DIGARS2.0, outperforming all the state-of-the-art energy functions. Based on Rosetta and Tasser decoy-sets 3DIGARS2.0 resulted 80.78%, 73.77%, 141.24%, 16.52%, and 32.32% improvement over DFIRE, RWplus, dDFIRE, GOAP and 3DIGARS respectively.
    Journal of Theoretical Biology 06/2015; 380:380-391. DOI:10.1016/j.jtbi.2015.06.012 · 2.12 Impact Factor
Show more