Article

RNAcentral: A vision for an international database of RNA sequences

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SA, United Kingdom.
RNA (Impact Factor: 4.62). 10/2011; 17:1941-1946. DOI: 10.1261/rna.2750811
Source: PubMed

ABSTRACT During the last decade there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs. There is also a large growth in the amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge. In this article we propose the creation of a new open public resource that we term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases. We envision RNA researchers from all over the world joining a federated RNAcentral network, contributing specialized knowledge and databases. RNAcentral would centralize key data that are currently held across a variety of databases, allowing researchers instant access to a single, unified resource. This resource would facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health. We encourage additional RNA database resources and research groups to join this effort. We aim to obtain international network funding to further this endeavor.

Download full-text

Full-text

Available from: Simon Moxon, Jun 30, 2015
0 Followers
 · 
148 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Small RNAs, including microRNA and short-interfering RNAs, play important roles in plants. In recent years, developments in sequencing technology have enabled the large-scale discovery of sRNAs in various cells, tissues and developmental stages and in response to various stresses. This review describes the bioinformatics challenges to analysing these large datasets of short-RNA sequences and some of the solutions to those challenges.
    Briefings in functional genomics 12/2011; 11(1):71-85. DOI:10.1093/bfgp/elr039 · 3.43 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.
    RNA 12/2011; 18(2):193-212. DOI:10.1261/rna.030049.111 · 4.62 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The identification of orthologs-genes pairs descended from a common ancestor through speciation, rather than duplication-has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second 'Quest for Orthologs' meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications.
    Bioinformatics 02/2012; 28(6):900-4. DOI:10.1093/bioinformatics/bts050 · 4.62 Impact Factor