Identification and Properties of 1,119 Candidate LincRNA Loci in the Drosophila melanogaster Genome

MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, United Kingdom.
Genome Biology and Evolution (Impact Factor: 4.23). 03/2012; 4(4):427-42. DOI: 10.1093/gbe/evs020
Source: PubMed


The functional repertoire of long intergenic noncoding RNA (lincRNA) molecules has begun to be elucidated in mammals. Determining the biological relevance and potential gene regulatory mechanisms of these enigmatic molecules would be expedited in a more tractable model organism, such as Drosophila melanogaster. To this end, we defined a set of 1,119 putative lincRNA genes in D. melanogaster using modENCODE whole transcriptome (RNA-seq) data. A large majority (1.1 of 1.3 Mb; 85%) of these bases were not previously reported by modENCODE as being transcribed. Significant selective constraint on the sequences of these loci predicts that virtually all have sustained functionality across the Drosophila clade. We observe biases in lincRNA genomic locations and expression profiles that are consistent with some of these lincRNAs being involved in the regulation of neighboring protein-coding genes with developmental functions. We identify lincRNAs that may be important in the developing nervous system and in male-specific organs, such as the testes. LincRNA loci were also identified whose positions, relative to nearby protein-coding loci, are equivalent between D. melanogaster and mouse. This study predicts that the genomes of not only vertebrates, such as mammals, but also an invertebrate (fruit fly) harbor large numbers of lincRNA loci. Our findings now permit exploitation of Drosophila genetics for the investigation of lincRNA mechanisms, including lincRNAs with potential functional analogues in mammals.

Download full-text


Available from: Andrew Bassett,
  • Source
    • "Considering their genomic locations, lncRNAs can be mainly classified as i) intergenic lncRNAs (lincRNAs) (Guttman et al., 2009), ii) intronic lncRNAs (incRNAs) (Braconi et al., 2011), and iii) natural antisense transcripts (NATs, as cis-NATs and trans-NATs) with their sequences complementary (or partially complementary ) to other transcripts at the same (or different) genomic locus (Faghihi and Wahlestedt, 2009). In the past years, genome-wide explorations , e.g., tiling array, chromatin signature, and RNA-sequencing approach , have detected the expression of lncRNAs in many organisms, such as Homo sapiens, Drosophila melanogaster, Mus musculus, and Danio rerio (Guttman et al., 2010; Cabili et al., 2011; Pauli et al., 2012; Young et al., 2012). To date, a large body of evidence has demonstrated that lncRNAs play a critical role in transcriptional interference, cell differentiation , epigenetic modification, genomic imprinting, and other important biological processes (Dinger et al., 2008; Yu et al., 2008; Gupta et al., 2010; Gibb et al., 2011; Guttman and Rinn, 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Accumulating published reports have confirmed the critical biological role (e.g., cell differentiation, gene regulation, stress response) for plant long non-coding RNAs (lncRNAs). However, a literature-derived database with the aim of lncRNA curation, data deposit and further distribution remains still absent for this particular lncRNA clade. PLNlncRbase has been designed as an easy-to-use resource to provide detailed information for experimentally identified plant lncRNAs. In the current version, PLNlncRbase has manually collected data from nearly 200 published literature, covering a total of 1187 plant lncRNAs in 43 plant species. The user can retrieve plant lncRNA entries from a well-organized interface through a keyword search by using the name of plant species or a lncRNA identifier. Each entry upon a query will be returned with detailed information for a specific plant lncRNA, including the species name, a lncRNA identifier, a brief description of the potential biological role, the lncRNA sequence, the lncRNA classification, an expression pattern of the lncRNA, the tissue/developmental stage/condition for lncRNA expression, the detection method for lncRNA expression, a reference literature, and the potential target gene(s) of the lncRNA extracted from the original reference. This database will be regularly updated to greatly facilitate future investigations of plant lncRNAs pertaining to their biological significance. The PLNlncRbase database is now freely available at Copyright © 2015. Published by Elsevier B.V.
    Gene 07/2015; 573(2). DOI:10.1016/j.gene.2015.07.069 · 2.14 Impact Factor
  • Source
    • "From R5.24 to R6.03, 2313 new candidate non-coding genes were annotated (including those flagged as antisense, see below). We assessed the proposed lncRNAs described in the published literature (Tupy et al. 2005; Inagaki, et al. 2005; Hiller et al. 2009; Young, et al. 2012) and annotated many, but not all, of the lncRNAs proposed. Unless we had independent evidence that the region is transcribed (for example, RNA-Seq coverage data), we did not annotate the predicted lncRNA gene. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We report the current status of the FlyBase annotated gene set for Drosophila melanogaster and highlight improvements based on high-throughput data. The FlyBase annotated gene set consists entirely of manually annotated gene models, with the exception of some classes of small non-coding RNAs. All gene models have been reviewed using evidence from high-throughput datasets, primarily from the modENCODE project. These datasets include RNA-Seq coverage data, RNA-Seq junction data, transcription start site profiles, and translation stop-codon read-through predictions. New annotation guidelines were developed to take into account the use of the high-throughput data. We describe how this flood of new data was incorporated into thousands of new and revised annotations. FlyBase has adopted a philosophy of excluding low confidence and low frequency data from gene model annotations; we also do not attempt to represent all possible permutations for complex and modularly organized genes. This has allowed us to produce a high-confidence, manageable gene annotation dataset that is available at FlyBase ( Interesting aspects of new annotations include new genes (coding, non-coding, and antisense), many genes with alternative transcripts with very long 3' UTRs (up to 15-18kb), and a stunning mismatch in the number of male-specific genes (approximately 13 percent of all annotated gene models) vs. female-specific genes (fewer than 1 percent). The number of identified pseudogenes and mutations in the sequenced strain also increased significantly. We discuss remaining challenges, for instance, identification of functional small polypeptides and detection of alternative translation starts. Copyright © 2015 Author et al.
    G3-Genes Genomes Genetics 06/2015; 5(8). DOI:10.1534/g3.115.018929 · 3.20 Impact Factor
  • Source
    • "We defined a comprehensive yet conservative set of 2,935 single and multiexonic noncoding RNA transcripts, which includes lincRNAs, intronic lncRNAs, antisense overlapping lncRNAs, and precursors for sRNAs. This conservative estimate of A. queenslandica lncRNAs—the first lncRNAs catalog in an early-branching metazoan—shares many of the characteristics of their bilaterian counterparts (Guttman et al. 2009, 2010, 2011; Cabili et al. 2011; Nam and Bartel 2012; Pauli et al. 2012; Young et al. 2012; Brown et al. 2014; Zhou et al. 2014). Specifically, they are relatively short in length, have a low number of exons, display temporally restricted expression profiles throughout development , and have low sequence conservation in comparison to protein-coding genes. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Long non-coding RNAs (lncRNAs) are important developmental regulators in bilaterian animals. A correlation has been claimed between the lncRNA repertoire expansion and morphological complexity in vertebrate evolution. However, this claim has not been tested by examining morphologically simple animals. Here, we undertake a systematic investigation of lncRNAs in the demosponge Amphimedon queenslandica, a morphologically-simple, early-branching metazoan. We combine RNA-Seq data across multiple developmental stages of Amphimedon with a filtering pipeline to conservatively predict 2,935 lncRNAs. These include intronic overlapping lncRNAs, exonic antisense overlapping lncRNAs, long intergenic ncRNAs and precursors for small RNAs. Sponge lncRNAs are remarkably similar to their bilaterian counterparts in being relatively short with few exons and having low primary sequence conservation relative to protein-coding genes. As in bilaterians, a majority of sponge lncRNAs exhibit typical hallmarks of regulatory molecules, including high temporal specificity and dynamic developmental expression. Specific lncRNA expression profiles correlate tightly with conserved protein-coding genes likely involved in a range of developmental and physiological processes, such as the Wnt signaling pathway. Although the majority of Amphimedon lncRNAs appear to be taxonomically-restricted with no identifiable orthologues, we find a few cases of conservation between demosponges in lncRNAs that are antisense to coding sequences. Based on the high similarity in the structure, organisation and dynamic expression of sponge lncRNAs to their bilaterian counterparts, we propose that these non-coding RNAs are an ancient feature of the metazoan genome. These results are consistent with lncRNAs regulating the development of animals, regardless of their level of morphological complexity.
    Molecular Biology and Evolution 05/2015; DOI:10.1093/molbev/msv117 · 9.11 Impact Factor
Show more