Recurrent duplication-driven transposition of DNA during hominoid evolution

Baylor College of Medicine, Houston, Texas, United States
Proceedings of the National Academy of Sciences (Impact Factor: 9.67). 12/2006; 103(47):17626-31. DOI: 10.1073/pnas.0605426103
Source: PubMed


The underlying mechanism by which the interspersed pattern of human segmental duplications has evolved is unknown. Based on a comparative analysis of primate genomes, we show that a particular segmental duplication (LCR16a) has been the source locus for the formation of the majority of intrachromosomal duplications blocks on human chromosome 16. We provide evidence that this particular segment has been active independently in each great ape and human lineage at different points during evolution. Euchromatic sequence that flanks sites of LCR16a integration are frequently lineage-specific duplications. This process has mobilized duplication blocks (15-200 kb in size) to new genomic locations in each species. Breakpoint analysis of lineage-specific insertions suggests coordinated deletion of repeat-rich DNA at the target site, in some cases deleting genes in that species. Our data support a model of duplication where the probability that a segment of DNA becomes duplicated is determined by its proximity to core duplicons, such as LCR16a.

Download full-text


Available from: Mario Ventura, May 29, 2015
  • Source
    • "To investigate possible role by repetitive elements (REs) in mediating such dispersed duplication with a clue from previous studies [20,26-29], we performed repeat masking on the putative duplication source locus and the other 33 duplicons and observed a preponderance of flanking REs, especially of non-LTR class prominent of which were the S. japonicum RTE (retrotransposable element)-like retrotransposon (SjR2) and the Perere class of retrotransposons (SjR1) (Additional file 5). An almost full copy of SjR2 was found upstream of the coding region of the putative source locus in addition to other six albeit partial copies of SjR2. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Evolution of novel protein-coding genes is the bedrock of adaptive evolution. Recently, we identified six protein-coding genes with similar signal sequence from Schistosoma japonicum egg stage mRNA using signal sequence trap (SST). To find the mechanism underlying the origination of these genes with similar core promoter regions and signal sequence, we adopted an integrated approach utilizing whole genome, transcriptome and proteome database BLAST queries, other bioinformatics tools, and molecular analyses. Our data, in combination with database analyses showed evidences of expression of these genes both at the mRNA and protein levels exclusively in all developmental stages of S. japonicum. The signal sequence motif was identified in 27 distinct S. japonicum UniGene entries with multiple mRNA transcripts, and in 34 genome contigs distributed within 18 scaffolds with evidence of genome-wide dispersion. No homolog of these genes or similar domain was found in deposited data from any other organism. We observed preponderance of flanking repetitive elements (REs), albeit partial copies, especially of the RTE-like and Perere class at either side of the duplication source locus. The role of REs as major mediators of DNA-level recombination leading to dispersive duplication is discussed with evidence from our analyses. We also identified a stepwise pathway towards functional selection in evolving genes by alternative splicing. Equally, the possible transcription models of some protein-coding representatives of the duplicons are presented with evidence of expression in vitro. Our findings contribute to the accumulating evidence of the role of REs in the generation of evolutionary novelties in organisms' genomes.
    Full-text · Article · Jun 2012 · BMC Genomics
  • Source
    • "A comparison between two forms of genome assembly, that is, hierarchical sequencing of large insert clones and wholegenome shotgun sequence assembly (WGSA) of reads, revealed that the WGSA method yields a 20-Mb shorter sequence than the clone-based assembly (Marques-Bonet et al. 2009). Length discrepancy is caused by the failure of many whole-genome shotgun reads to map to a locus containing a highly duplicated and rapidly evolving gene family (Johnson et al. 2006). This problem will be further aggravated when significantly shorter NGS reads are used (Marques-Bonet et al. 2009). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome sequencing of closely related individuals has yielded valuable insights that link genome evolution to phenotypic variations. However, advancement in sequencing technology has also led to an escalation in the number of poor quality-drafted genomes assembled based on reference genomes that can have highly divergent or haplotypic regions. The self-fertilizing nature of Arabidopsis thaliana poses an advantage to sequencing projects because its genome is mostly homozygous. To determine the accuracy of an Arabidopsis drafted genome in less conserved regions, we performed a resequencing experiment on a ∼371-kb genomic interval in the Landsberg erecta (Ler-0) accession. We identified novel structural variations (SVs) between Ler-0 and the reference accession Col-0 using a long-range polymerase chain reaction approach to generate an Illumina data set that has positional information, that is, a data set with reads that map to a known location. Positional information is important for accurate genome assembly and the resolution of SVs particularly in highly duplicated or repetitive regions. Sixty-one regions with misassembly signatures were identified from the Ler-0 draft, suggesting the presence of novel SVs that are not represented in the draft sequence. Sixty of those were resolved by iterative mapping using our data set. Fifteen large indels (>100 bp) identified from this study were found to be located either within protein-coding regions or upstream regulatory regions, suggesting the formation of novel alleles or altered regulation of existing genes in Ler-0. We propose future genome-sequencing experiments to follow a clone-based approach that incorporates positional information to ultimately reveal haplotype-specific differences between accessions.
    Full-text · Article · May 2011 · Genome Biology and Evolution
  • Source
    • "We found that Alu elements, especially young AluY, were enriched in the immediate adjacent regions of frequently duplicated sequences (subunits, duplication loci, and CNVs). Our results thus extend previous findings that have shown the presence of Alu elements at the endpoints of SDs at a higher frequency than expected by chance (24% vs 10%) [15,16] and more specifically a three-fold enrichment of Alu in the junctions of LCR16 [37]. In addition, we have previously shown that both simple and complex Alu-mediated duplications stimulated by crossovers at the ends of Alu elements may have contributed to the formation of unprocessed pseudogenes from the four LCR22 genes [14,26]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Segmental duplications (SDs) on 22q11.2 (LCR22), serve as substrates for meiotic non-allelic homologous recombination (NAHR) events resulting in several clinically significant genomic disorders. To understand the duplication activity leading to the complicated SD structure of this region, we have applied the A-Bruijn graph algorithm to decompose the 22q11.2 SDs to 523 fundamental duplication sequences, termed subunits. Cross-species syntenic analysis of primate genomes demonstrates that many of these LCR22 subunits emerged very recently, especially those implicated in human genomic disorders. Some subunits have expanded more actively than others, and young Alu SINEs, are associated much more frequently with duplicated sequences that have undergone active expansion, confirming their role in mediating recombination events. Many copy number variations (CNVs) exist on 22q11.2, some flanked by SDs. Interestingly, two chromosome breakpoints for 13 CNVs (mean length 65 kb) are located in paralogous subunits, providing direct evidence that SD subunits could contribute to CNV formation. Sequence analysis of PACs or BACs identified extra CNVs, specifically, 10 insertions and 18 deletions within 22q11.2; four were more than 10 kb in size and most contained young AluYs at their breakpoints. Our study indicates that AluYs are implicated in the past and current duplication events, and moreover suggests that DNA rearrangements in 22q11.2 genomic disorders perhaps do not occur randomly but involve both actively expanded duplication subunits and Alu elements.
    Full-text · Article · Jan 2011 · BMC Genomics
Show more