Article

Recent de novo origin of human protein-coding genes

Smurfit Institute of Genetics, University of Dublin, Trinity College, Ireland.
Genome Research (Impact Factor: 13.85). 10/2009; 19(10):1752-9. DOI: 10.1101/gr.095026.109
Source: PubMed

ABSTRACT The origin of new genes is extremely important to evolutionary innovation. Most new genes arise from existing genes through duplication or recombination. The origin of new genes from noncoding DNA is extremely rare, and very few eukaryotic examples are known. We present evidence for the de novo origin of at least three human protein-coding genes since the divergence with chimp. Each of these genes has no protein-coding homologs in any other genome, but is supported by evidence from expression and, importantly, proteomics data. The absence of these genes in chimp and macaque cannot be explained by sequencing gaps or annotation error. High-quality sequence data indicate that these loci are noncoding DNA in other primates. Furthermore, chimp, gorilla, gibbon, and macaque share the same disabling sequence difference, supporting the inference that the ancestral sequence was noncoding over the alternative possibility of parallel gene inactivation in multiple primate lineages. The genes are not well characterized, but interestingly, one of them was first identified as an up-regulated gene in chronic lymphocytic leukemia. This is the first evidence for entirely novel human-specific protein-coding genes originating from ancestrally noncoding sequences. We estimate that 0.075% of human genes may have originated through this mechanism leading to a total expectation of 18 such cases in a genome of 24,000 protein-coding genes.

0 Followers
 · 
205 Views
 · 
1 Download
  • Source
    • "Many " novel " protein-coding sequences are rapidly diverging copies of older protein-coding sequences, following either duplication within a species or duplication associated with horizontal transfer from a different species (Ohno 1970; Long et al. 2003). However, some protein-coding genes are novel in a more fundamental way, being derived from noncoding sequences (Levine et al. 2006; Begun et al. 2007; Chen et al. 2007; Cai et al. 2008; Zhou et al. 2008; Knowles and McLysaght 2009; Siepel 2009; Tay et al. 2009; Toll-Riera et al. 2009; Xiao et al. 2009; Li, Dong, et al. 2010; Li, Zhang, et al. 2010; Donoghue et al. 2011; Tautz and Domazet-Lošo 2011; Wilson and Masel 2011; Wu et al. 2011; Yang and Huang 2011; Ding et al. 2012; Murphy and McLysaght 2012; Xie et al. 2012; Long et al. 2013; Reinhardt et al. 2013; Suenaga et al. 2014; Zhao et al. 2014). Because de novo gene evolution is hard to detect, known cases may be the tip of the iceberg, and noncoding sequences may be a common source of orphan genes, that is, genes that lack detectable homology to known proteins outside a given lineage (Tautz and Domazet-Lošo 2011; Wu et al. 2011; Ruiz-Orera et al. 2014) This hypothesis is supported by the statistical tendency for young genes as a whole to show characteristics that are better explained by de novo origination than by geneduplication-divergence , including short length, fewer exons, and fewer domains (Neme and Tautz 2013). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Protein-coding sequences can arise either from duplication and divergence of existing sequences, or de novo from non-coding DNA. Unfortunately, recently evolved de novo genes can be hard to distinguish from false positives, making their study difficult. Here we study a more tractable version of the process of conversion of non-coding sequence into coding: the co-option of short segments of non-coding sequence into the C-termini of existing proteins via the loss of a stop codon. Because we study recent additions to potentially old genes, we are able to apply a variety of stringent quality filters to our annotations of what is a true protein coding gene, discarding the putative proteins of unknown function that are typical of recent fully de novo genes. We identify 54 examples of C-terminal extensions in Saccharomyces and 28 in Drosophila, all of them recent enough to still be polymorphic. We find one putative gene fusion that turns out, on close inspection, to be the product of replicated assembly errors, further highlighting the issue of false positives in the study of rare events. Four of the Saccharomyces C-terminal extensions (to ADH1, ARP8, TPM2 and PIS1) that survived our quality filters are predicted to lead to significant modification of a protein domain structure. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
    Genome Biology and Evolution 05/2015; DOI:10.1093/gbe/evv098 · 4.53 Impact Factor
  • Source
    • "RNA-seq is not dependent on prior information of the genomic sequence of the target species, which has been widely applied for transcriptome-related studies in many Brassicaceae plant species (Paritosh et al., 2013; Wang et al., 2013b; Kim et al., 2014; Mudalkar et al., 2014). The de novo assembly of sequencing reads is an important step to obtain genome information, such as novel gene discovery, transcription factor (TF) discovery, Simple Sequence Repeat (SSR) mining, and gene expression profile analysis (Powell et al., 1996; Bouché et al., 2002; Heim et al., 2003; Jiao et al., 2003; Knowles and McLysaght, 2009; Zhang et al., 2012b). For example, it has been reported that 30 TF families containing approximately 1500 potential TFs were identified after the completion of the Arabidopsis thaliana genome sequencing project (Riechmann et al., 2000; Mitsuda and Ohme-Takagi, 2009). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Raphanus sativus is an important Brassicaceae plant and also an edible vegetable with great economic value. However, currently there is not enough transcriptome information of R. sativus tissues, which impedes further functional genomics research on R. sativus. In this study, RNA-seq technology was employed to characterize the transcriptome of leaf tissues. Approximately 70 million clean pair-end reads were obtained and used for de novo assembly by Trinity program, which generated 68,086 unigenes with an average length of 576 bp. All the unigenes were annotated against GO and KEGG databases. In the meanwhile, we merged leaf sequencing data with existing root sequencing data and obtained better de novo assembly of R. sativus using Oases program. Accordingly, potential simple sequence repeats (SSRs), transcription factors (TFs) and enzyme codes were identified in R. sativus. Additionally, we detected a total of 3563 significantly differentially expressed genes (DEGs, P = 0.05) and tissue-specific biological processes between leaf and root tissues. Furthermore, a TFs-based regulation network was constructed using Cytoscape software. Taken together, these results not only provide a comprehensive genomic resource of R. sativus but also shed light on functional genomic and proteomic research on R. sativus in the future.
    Frontiers in Plant Science 03/2015; DOI:10.3389/fpls.2015.00198 · 3.95 Impact Factor
  • Source
    • "The most stringent criterion for involvement of de novo processes in explaining orphan genes requires that syntenic blocks spanning an orphan gene are present in outgroup organisms as noncoding sequences that are not transcribed (Cai et al., 2008; Knowles and McLysaght, 2009). To further support de novo gene birth of the four selected genes, we performed synteny analysis with phylogenetically closely related fungal species, including Gaeumannomyces graminis (a member of the Magnaporthaceae family) (Table S2). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genomes contain a large number of unique genes which have not been found in other species. Although the origin of such "orphan" genes remains unclear, they are thought to be involved in species-specific adaptive processes. Here, we analyzed seven orphan genes (MoSPC1 to MoSPC7) prioritized based on in planta expressed sequence tag data in the rice blast fungus, Magnaporthe oryzae. Expression analysis using qRT-PCR confirmed the expression of four genes (MoSPC1, MoSPC2, MoSPC3 and MoSPC7) during plant infection. However, individual deletion mutants of these four genes did not differ from the wild-type strain for all phenotypes examined, including pathogenicity. The length, GC contents, codon adaptation index and expression during mycelial growth of the four genes suggest that these genes formed during the evolutionary history of M. oryzae. Synteny analyses using closely related fungal species corroborated the notion that these genes evolved de novo in the M. oryzae genome. In this report, we discuss our inability to detect phenotypic changes in the four deletion mutants. Based on these results, the four orphan genes may be products of de novo gene birth processes, and their adaptive potential is in the course of being tested for retention or extinction through natural selection.
    The plant pathology journal 12/2014; 30(4):367-74. DOI:10.5423/PPJ.OA.08.2014.0072 · 0.76 Impact Factor
Show more

Preview

Download
1 Download
Available from