IRSpot-PseDNC: Identify recombination spots with pseudo dinucleotide composition

Department of Physics, School of Sciences, Center for Genomics and Computational Biology, Hebei United University, Tangshan, China, Bioinformatics and Computer-Aided Drug Discovery, Gordon Life Science Institute, San Diego, CA, USA, School of Public Health, Hebei United University, Tangshan 063000, China and Key Laboratory for Neuro-Information of Ministry of Education, Center of Bioinformatics, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China.
Nucleic Acids Research (Impact Factor: 9.11). 01/2013; 41(6). DOI: 10.1093/nar/gks1450
Source: PubMed


Meiotic recombination is an important biological process. As a main driving force of evolution, recombination provides natural new combinations of genetic variations. Rather than randomly occurring across a genome, meiotic recombination takes place in some genomic regions (the so-called 'hotspots') with higher frequencies, and in the other regions (the so-called 'coldspots') with lower frequencies. Therefore, the information of the hotspots and coldspots would provide useful insights for in-depth studying of the mechanism of recombination and the genome evolution process as well. So far, the recombination regions have been mainly determined by experiments, which are both expensive and time-consuming. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying the recombination regions. In this study, a predictor, called 'iRSpot-PseDNC', was developed for identifying the recombination hotspots and coldspots. In the new predictor, the samples of DNA sequences are formulated by a novel feature vector, the so-called 'pseudo dinucleotide composition' (PseDNC), into which six local DNA structural properties, i.e. three angular parameters (twist, tilt and roll) and three translational parameters (shift, slide and rise), are incorporated. It was observed by the rigorous jackknife test that the overall success rate achieved by iRSpot-PseDNC was >82% in identifying recombination spots in Saccharomyces cerevisiae, indicating the new predictor is promising or at least may become a complementary tool to the existing methods in this area. Although the benchmark data set used to train and test the current method was from S. cerevisiae, the basic approaches can also be extended to deal with all the other genomes. Particularly, it has not escaped our notice that the PseDNC approach can be also used to study many other DNA-related problems. As a user-friendly web-server, iRSpot-PseDNC is freely accessible at

Download full-text


Available from: Hao Lin, Jun 17, 2014
  • Source
    • "Unfortunately, the conventional formulations for the four metrics are not quite intuitive for most experimental scientists, particularly the one for MCC. Interestingly, by using the symbols and derivation as used in[103]for studying signal peptides, the aforementioned four metrics can be formulated by a set of equations given below[14,30,60,61,104] "

    Full-text · Article · Jan 2016 · Molecules
  • Source
    • "As demonstrated in a series of recent publications (see, e.g. Chen et al., 2013; Jia et al., 2015; Lin et al., 2014; Liu et al., 2015; Xu et al., 2013) and emphasized in a recent review (Chou, 2015), userfriendly and publicly accessible web-servers represent the future direction for developing practically more useful models, simulated methods, predictors, or demonstrating new and novel structures, and we shall make efforts in our future work to provide a webserver for the findings presented in this paper. Next steps in the structural studies of CIP2A would be to crystallize CIP2A-ArmRP in complex with a possible interaction partner to deduce the exact peptide-binding site and molecular interactions. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Cancerous Inhibitor of Protein Phosphatase 2A (CIP2A) is a human oncoprotein, which exerts its cancer-promoting function through interaction with other proteins, for example Protein Phosphatase 2A (PP2A) and MYC. The lack of structural information for CIP2A significantly prevents the design of anti-cancer therapeutics targeting this protein. In an attempt to counteract this fact, we modeled the three-dimensional structure of the N-terminal domain (CIP2A-ArmRP), analyzed key areas and amino acids, and coupled the results to the existing literature. The model reliably shows a stable armadillo repeat fold with a positively charged groove. The fact that this conserved groove highly likely binds peptides is corroborated by the presence of a conserved polar ladder, which is essential for the proper peptide-binding mode of armadillo repeat proteins and, according to our results, several known CIP2A interaction partners appropriately possess an ArmRP-binding consensus motif. Moreover, we show that Arg229Gln, which has been linked to the development of cancer, causes a significant change in charge and surface properties of CIP2A-ArmRP. In conclusion, our results reveal that CIP2A-ArmRP shares the typical fold, protein-protein interaction site and interaction patterns with other natural armadillo proteins and that, presumably, several interaction partners bind into the central groove of the modeled CIP2A-ArmRP. By providing essential structural characteristics of CIP2A, the present study significantly increases our knowledge on how CIP2A interacts with other proteins in cancer progression and how to develop new therapeutics targeting CIP2A.
    Full-text · Article · Sep 2015 · Journal of Theoretical Biology
  • Source
    • "The benchmark dataset S as well as its subsets S + and S − , along with the corresponding detailed sequences are given in Supporting information S1. As pointed in Chou (2011) and concurred in a series of recent publications (see, e.g., Chen et al., 2012; Min and Xiao, 2013; Xiao et al., 2013a, 2015; Xu et al., 2013b, 2014b; Liu et al., 2014a, 2015a; Qiu et al., 2014, 2015; Jia et al., 2015), one of the keys in successfully developing a sequence-based statistical predictor is how to effectively formulate the sequence samples concerned with an effective mathematical expression that can truly capture their intrinsic correlation with the target to be predicted. Below we are to address this problem. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The microRNA (miRNA), a small non-coding RNA molecule, plays an important role in transcriptional and post-transcriptional regulation of gene expression. Its abnormal expression, however, has been observed in many cancers and other disease states, implying that the miRNA molecules are also deeply involved in these diseases, particularly in carcinogenesis. Therefore, it is important for both basic research and miRNA-based therapy to discriminate the real pre-miRNAs from the false ones (such as hairpin sequences with similar stem-loops). Most existing methods in this regard were based on the strategy in which RNA samples were formulated by a vector formed by their Kmer components. But the length of Kmers must be very short; otherwise, the vector's dimension would be extremely large, leading to the "high-dimension disaster" or overfitting problem. Inspired by the concept of "degenerate energy levels" in quantum mechanics, we introduced the "degenerate Kmer" (deKmer) to represent RNA samples. By doing so, not only we can accommodate long-range coupling effects but also we can avoid the high-dimension problem. Rigorous jackknife tests and cross-species experiments indicated that our approach is very promising. It has not escaped our notice that the deKmer approach can also be applied to many other areas of computational biology. A user-friendly web-server for the new predictor has been established at, by which users can easily get their desired results.
    Full-text · Article · Sep 2015 · Journal of Theoretical Biology
Show more