PRIDB: a Protein-RNA Interface Database.

Bioinformatics and Computational Biology Program, Iowa State University, Department of Genetics, Development and Cell Biology, Iowa State University, Department of Computer Science, Iowa State University, Ames, IA 50011, Department of Biology, Elon University, Elon, NC 27244 and Computational Systems Biology Summer Institute, Iowa State University, Ames, IA 50011, USA.
Nucleic Acids Research (Impact Factor: 8.81). 11/2010; 39(Database issue):D277-82. DOI: 10.1093/nar/gkq1108
Source: PubMed

ABSTRACT The Protein-RNA Interface Database (PRIDB) is a comprehensive database of protein-RNA interfaces extracted from complexes in the Protein Data Bank (PDB). It is designed to facilitate detailed analyses of individual protein-RNA complexes and their interfaces, in addition to automated generation of user-defined data sets of protein-RNA interfaces for statistical analyses and machine learning applications. For any chosen PDB complex or list of complexes, PRIDB rapidly displays interfacial amino acids and ribonucleotides within the primary sequences of the interacting protein and RNA chains. PRIDB also identifies ProSite motifs in protein chains and FR3D motifs in RNA chains and provides links to these external databases, as well as to structure files in the PDB. An integrated JMol applet is provided for visualization of interacting atoms and residues in the context of the 3D complex structures. The current version of PRIDB contains structural information regarding 926 protein-RNA complexes available in the PDB (as of 10 October 2010). Atomic- and residue-level contact information for the entire data set can be downloaded in a simple machine-readable format. Also, several non-redundant benchmark data sets of protein-RNA complexes are provided. The PRIDB database is freely available online at

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein-RNA complexes provide a wide range of essential functions in the cell. Their atomic experimental structure solving, despite essential to the understanding of these functions, is often difficult and expensive. Docking approaches that have been developed for proteins are often challenging to adapt for RNA because of its inherent flexibility and the structural data available being relatively scarce. In this study we adapted the RosettaDock protocol for protein-RNA complexes both at the nucleotide and atomic levels. Using a genetic algorithm-based strategy, and a non-redundant protein-RNA dataset, we derived a RosettaDock scoring scheme able not only to discriminate but also score efficiently docking decoys. The approach proved to be both efficient and robust for generating and identifying suitable structures when applied to two protein-RNA docking benchmarks in both bound and unbound settings. It also compares well to existing strategies. This is the first approach that currently offers a multi-level optimized scoring approach integrated in a full docking suite, leading the way to adaptive fully flexible strategies.
    PLoS ONE 01/2014; 9(9):e108928. · 3.53 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Increasing awareness of the importance of protein-RNA interactions has motivated many approaches to predict residue-level RNA binding sites in proteins based on sequence or structural characteristics. Sequence-based predictors are usually high in sensitivity but low in specificity; conversely structure-based predictors tend to have high specificity, but lower sensitivity. Here we quantified the contribution of both sequence- and structure-based features as indicators of RNA-binding propensity using a machine-learning approach. In order to capture structural information for proteins without a known structure, we used homology modeling to extract the relevant structural features. Several novel and modified features enhanced the accuracy of residue-level RNA-binding propensity beyond what has been reported previously, including by meta-prediction servers. These features include: hidden Markov model-based evolutionary conservation, surface deformations based on the Laplacian norm formalism, and relative solvent accessibility partitioned into backbone and side chain contributions. We constructed a web server called aaRNA that implements the proposed method and demonstrate its use in identifying putative RNA binding sites.
    Nucleic Acids Research 07/2014; · 8.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Protein–RNA interaction plays a very crucial role in many biological processes, such as protein synthesis, transcription and post-transcription of gene expression and pathogenesis of disease. Especially RNAs always function through binding to proteins. Identification of binding interface region is especially useful for cellular pathways analysis and drug design. In this study, we proposed a novel approach for binding sites identification in proteins, which not only integrates local features and global features from protein sequence directly, but also constructed a balanced training dataset using sub-sampling based on submodularity subset selection. Firstly we extracted local features and global features from protein sequence, such as evolution information and molecule weight. Secondly, the number of non-interaction sites is much more than interaction sites, which leads to a sample imbalance problem, and hence biased machine learning model with preference to non-interaction sites. To better resolve this problem, instead of previous randomly sub-sampling over-represented non-interaction sites, a novel sampling approach based on submodularity subset selection was employed, which can select more representative data subset. Finally random forest were trained on optimally selected training subsets to predict interaction sites. Our result showed that our proposed method is very promising for predicting protein–RNA interaction residues, it achieved an accuracy of 0.863, which is better than other state-of-the-art methods. Furthermore, it also indicated the extracted global features have very strong discriminate ability for identifying interaction residues from random forest feature importance analysis.
    Computational Biology and Chemistry. 11/2014;

Full-text (2 Sources)

Available from
May 26, 2014