Article

A computational and experimental approach to validating annotations and gene predictions in the Drosophila melanogaster genome.

Howard Hughes Medical Institute and Department of Molecular and Cell Biology, University of California, Life Sciences Addition, Berkeley, CA 94720-3200, USA.
Proceedings of the National Academy of Sciences (Impact Factor: 9.74). 03/2005; 102(5):1566-71. DOI: 10.1073/pnas.0409421102
Source: PubMed

ABSTRACT Five years after the completion of the sequence of the Drosophila melanogaster genome, the number of protein-coding genes it contains remains a matter of debate; the number of computational gene predictions greatly exceeds the number of validated gene annotations. We have assembled a collection of >10,000 gene predictions that do not overlap existing gene annotations and have developed a process for their validation that allows us to efficiently prioritize and experimentally validate predictions from various sources by sequencing RT-PCR products to confirm gene structures. Our data provide experimental evidence for 122 protein-coding genes. Our analyses suggest that the entire collection of predictions contains only approximately 700 additional protein-coding genes. Although we cannot rule out the discovery of genes with unusual features that make them refractory to existing methods, our results suggest that the D. melanogaster genome contains approximately 14,000 protein-coding genes.

0 Bookmarks
 · 
135 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: How non-coding DNA gives rise to new protein-coding genes (de novo genes) is not well understood. Recent work has revealed the origins and functions of a few de novo genes, but common principles governing the evolution or biological roles of these genes are unknown. To better define these principles, we performed a parallel analysis of the evolution and function of six putatively protein-coding de novo genes described in Drosophila melanogaster. Reconstruction of the transcriptional history of de novo genes shows that two de novo genes emerged from novel long non-coding RNAs that arose at least 5 MY prior to evolution of an open reading frame. In contrast, four other de novo genes evolved a translated open reading frame and transcription within the same evolutionary interval suggesting that nascent open reading frames (proto-ORFs), while not required, can contribute to the emergence of a new de novo gene. However, none of the genes arose from proto-ORFs that existed long before expression evolved. Sequence and structural evolution of de novo genes was rapid compared to nearby genes and the structural complexity of de novo genes steadily increases over evolutionary time. Despite the fact that these genes are transcribed at a higher level in males than females, and are most strongly expressed in testes, RNAi experiments show that most of these genes are essential in both sexes during metamorphosis. This lethality suggests that protein coding de novo genes in Drosophila quickly become functionally important.
    PLoS Genetics 10/2013; 9(10):e1003860. · 8.52 Impact Factor
  • Source
    PLoS Genetics 11/2011; 7(11):e1002381. · 8.52 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The falling cost of genome sequencing is having a marked impact on the research community with respect to which genomes are sequenced and how and where they are annotated. Genome annotation projects have generally become small-scale affairs that are often carried out by an individual laboratory. Although annotating a eukaryotic genome assembly is now within the reach of non-experts, it remains a challenging task. Here we provide an overview of the genome annotation process and the available tools and describe some best-practice approaches.
    Nature Reviews Genetics 01/2012; 13(5):329-42. · 41.06 Impact Factor

Full-text

View
0 Downloads
Available from