PRIDB: a Protein-RNA Interface Database.

Bioinformatics and Computational Biology Program, Iowa State University, Department of Genetics, Development and Cell Biology, Iowa State University, Department of Computer Science, Iowa State University, Ames, IA 50011, Department of Biology, Elon University, Elon, NC 27244 and Computational Systems Biology Summer Institute, Iowa State University, Ames, IA 50011, USA.
Nucleic Acids Research (Impact Factor: 8.81). 11/2010; 39(Database issue):D277-82. DOI: 10.1093/nar/gkq1108
Source: PubMed

ABSTRACT The Protein-RNA Interface Database (PRIDB) is a comprehensive database of protein-RNA interfaces extracted from complexes in the Protein Data Bank (PDB). It is designed to facilitate detailed analyses of individual protein-RNA complexes and their interfaces, in addition to automated generation of user-defined data sets of protein-RNA interfaces for statistical analyses and machine learning applications. For any chosen PDB complex or list of complexes, PRIDB rapidly displays interfacial amino acids and ribonucleotides within the primary sequences of the interacting protein and RNA chains. PRIDB also identifies ProSite motifs in protein chains and FR3D motifs in RNA chains and provides links to these external databases, as well as to structure files in the PDB. An integrated JMol applet is provided for visualization of interacting atoms and residues in the context of the 3D complex structures. The current version of PRIDB contains structural information regarding 926 protein-RNA complexes available in the PDB (as of 10 October 2010). Atomic- and residue-level contact information for the entire data set can be downloaded in a simple machine-readable format. Also, several non-redundant benchmark data sets of protein-RNA complexes are provided. The PRIDB database is freely available online at

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein-RNA complexes provide a wide range of essential functions in the cell. Their atomic experimental structure solving, despite essential to the understanding of these functions, is often difficult and expensive. Docking approaches that have been developed for proteins are often challenging to adapt for RNA because of its inherent flexibility and the structural data available being relatively scarce. In this study we adapted the RosettaDock protocol for protein-RNA complexes both at the nucleotide and atomic levels. Using a genetic algorithm-based strategy, and a non-redundant protein-RNA dataset, we derived a RosettaDock scoring scheme able not only to discriminate but also score efficiently docking decoys. The approach proved to be both efficient and robust for generating and identifying suitable structures when applied to two protein-RNA docking benchmarks in both bound and unbound settings. It also compares well to existing strategies. This is the first approach that currently offers a multi-level optimized scoring approach integrated in a full docking suite, leading the way to adaptive fully flexible strategies.
    PLoS ONE 09/2014; 9(9):e108928. · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Interaction of proteins with other molecules plays an important role in many biological activities. As many structures of protein-DNA complexes and protein-RNA complexes have been determined in the past years, several databases have been constructed to provide structure data of the complexes. However, the information on the binding sites between proteins and nucleic acids is not readily available from the structure data since the data consists mostly of the three-dimensional coordinates of the atoms in the complexes. We analyzed the huge amount of structure data for the hydrogen bonding interactions between proteins and nucleic acids and developed a database called DBBP (DataBase of Binding Pairs in protein-nucleic acid interactions, DBBP contains 44,955 hydrogen bonds (H-bonds) of protein-DNA interactions and 77,947 H-bonds of protein-RNA interactions. Analysis of the huge amount of structure data of protein-nucleic acid complexes is labor-intensive, yet provides useful information for studying protein-nucleic acid interactions. DBBP provides the detailed information of hydrogen-bonding interactions between proteins and nucleic acids at various levels from the atomic level to the residue level. The binding information can be used as a valuable resource for developing a computational method aiming at predicting new binding sites in proteins or nucleic acids.
    BMC Bioinformatics 01/2014; 15 Suppl 15:S5. · 2.67 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Protein–RNA interaction plays a very crucial role in many biological processes, such as protein synthesis, transcription and post-transcription of gene expression and pathogenesis of disease. Especially RNAs always function through binding to proteins. Identification of binding interface region is especially useful for cellular pathways analysis and drug design. In this study, we proposed a novel approach for binding sites identification in proteins, which not only integrates local features and global features from protein sequence directly, but also constructed a balanced training dataset using sub-sampling based on submodularity subset selection. Firstly we extracted local features and global features from protein sequence, such as evolution information and molecule weight. Secondly, the number of non-interaction sites is much more than interaction sites, which leads to a sample imbalance problem, and hence biased machine learning model with preference to non-interaction sites. To better resolve this problem, instead of previous randomly sub-sampling over-represented non-interaction sites, a novel sampling approach based on submodularity subset selection was employed, which can select more representative data subset. Finally random forest were trained on optimally selected training subsets to predict interaction sites. Our result showed that our proposed method is very promising for predicting protein–RNA interaction residues, it achieved an accuracy of 0.863, which is better than other state-of-the-art methods. Furthermore, it also indicated the extracted global features have very strong discriminate ability for identifying interaction residues from random forest feature importance analysis.
    Computational Biology and Chemistry 11/2014; · 1.60 Impact Factor

Full-text (2 Sources)

Available from
May 26, 2014