Article

Recurrent duplication-driven transposition of DNA during hominoid evolution.

Department of Genome Sciences and the Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA.
Proceedings of the National Academy of Sciences (Impact Factor: 9.81). 12/2006; 103(47):17626-31. DOI: 10.1073/pnas.0605426103
Source: PubMed

ABSTRACT The underlying mechanism by which the interspersed pattern of human segmental duplications has evolved is unknown. Based on a comparative analysis of primate genomes, we show that a particular segmental duplication (LCR16a) has been the source locus for the formation of the majority of intrachromosomal duplications blocks on human chromosome 16. We provide evidence that this particular segment has been active independently in each great ape and human lineage at different points during evolution. Euchromatic sequence that flanks sites of LCR16a integration are frequently lineage-specific duplications. This process has mobilized duplication blocks (15-200 kb in size) to new genomic locations in each species. Breakpoint analysis of lineage-specific insertions suggests coordinated deletion of repeat-rich DNA at the target site, in some cases deleting genes in that species. Our data support a model of duplication where the probability that a segment of DNA becomes duplicated is determined by its proximity to core duplicons, such as LCR16a.

Download full-text

Full-text

Available from: Mario Ventura, May 29, 2015
0 Followers
 · 
98 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome sequencing of closely related individuals has yielded valuable insights that link genome evolution to phenotypic variations. However, advancement in sequencing technology has also led to an escalation in the number of poor quality-drafted genomes assembled based on reference genomes that can have highly divergent or haplotypic regions. The self-fertilizing nature of Arabidopsis thaliana poses an advantage to sequencing projects because its genome is mostly homozygous. To determine the accuracy of an Arabidopsis drafted genome in less conserved regions, we performed a resequencing experiment on a ∼371-kb genomic interval in the Landsberg erecta (Ler-0) accession. We identified novel structural variations (SVs) between Ler-0 and the reference accession Col-0 using a long-range polymerase chain reaction approach to generate an Illumina data set that has positional information, that is, a data set with reads that map to a known location. Positional information is important for accurate genome assembly and the resolution of SVs particularly in highly duplicated or repetitive regions. Sixty-one regions with misassembly signatures were identified from the Ler-0 draft, suggesting the presence of novel SVs that are not represented in the draft sequence. Sixty of those were resolved by iterative mapping using our data set. Fifteen large indels (>100 bp) identified from this study were found to be located either within protein-coding regions or upstream regulatory regions, suggesting the formation of novel alleles or altered regulation of existing genes in Ler-0. We propose future genome-sequencing experiments to follow a clone-based approach that incorporates positional information to ultimately reveal haplotype-specific differences between accessions.
    Genome Biology and Evolution 05/2011; 3:627-40. DOI:10.1093/gbe/evr038 · 4.53 Impact Factor
  • Source
    Evutionary Biology Symposium 2009, October 2009, Perth Australia; 10/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The mammalian sex chromosomes evolved from an ordinary pair of autosomes during evolution. Unlike the X chromosome that is highly conserved, the Y chromosome is poorly conserved among mammalian lineages. Several special features set the Y chromosome apart from the rest of genome: male-limited transmission, absence of recombination, abundance of Y-specific repetitive sequences, degeneration of Y-linked genes during evolution, acquisition of autosomal genes, and accumulation and functional cluster of "testis genes" for maleness and reproduction. Since the degeneration process is lineage-dependent, different lineages retain different subsets of genes from the ancestral proto-Y chromosome, resulting in a diverse and lineage-specific Y chromosome gene content. During bovine evolution, a lineage-specific 'autosome-to-Y' transposition event resulted in three bovid-specific Y chromosome gene families, PRAMEY, ZNF280BY and ZNF280AY. Together, the male-specific region (MSY) of the bovine Y chromosome (BTAY) contains ~ 1200 protein coding genes that can be classified into 12 single copy and 16 multiple copy protein families. The copy number (CN) of these Y-linked gene families varies from 13 for PRAMEY to 236 for ZNF280BY, with significant differences between the taurine and indicine Y lineages. In addition, 367 non-coding RNA families (ncRNAs) were also identified on BTAY. Transcriptome analysis revealed that 95% of the BTAY genes/ncRNAs are expressed predominantly in testis and may be involved in spermatogenesis and male fertility. Though the functional role for the majority of the Y-linked genes needs to be determined, the preliminary data on PRAMEY clearly indicated a role in spermiogenesis. Furthermore, copy number variations (CNVs) of PRAMEY, ZNF280BY, TSPY and HSFY were found to be associated with testis size, sperm quality and fertility in dairy bulls. The authors discuss several challenges that influence male fertility selection associated with the bovine Y chromosome.
    Reproduction in Domestic Ruminants VIII, First edited by JL Juengel, A Miyamoto, C Price, LP Reynolds, MF Smith, R Webb, 01/2014: pages 239-255; Context Products Ltd., ISBN: 9781899043637
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome sequencing of closely related individuals has yielded valuable insights that link genome evolution to phenotypic variations. However, advancement in sequencing technology has also led to an escalation in the number of poor quality-drafted genomes assembled based on reference genomes that can have highly divergent or haplotypic regions. The self-fertilizing nature of Arabidopsis thaliana poses an advantage to sequencing projects because its genome is mostly homozygous. To determine the accuracy of an Arabidopsis drafted genome in less conserved regions, we performed a resequencing experiment on a ∼371-kb genomic interval in the Landsberg erecta (Ler-0) accession. We identified novel structural variations (SVs) between Ler-0 and the reference accession Col-0 using a long-range polymerase chain reaction approach to generate an Illumina data set that has positional information, that is, a data set with reads that map to a known location. Positional information is important for accurate genome assembly and the resolution of SVs particularly in highly duplicated or repetitive regions. Sixty-one regions with misassembly signatures were identified from the Ler-0 draft, suggesting the presence of novel SVs that are not represented in the draft sequence. Sixty of those were resolved by iterative mapping using our data set. Fifteen large indels (>100 bp) identified from this study were found to be located either within protein-coding regions or upstream regulatory regions, suggesting the formation of novel alleles or altered regulation of existing genes in Ler-0. We propose future genome-sequencing experiments to follow a clone-based approach that incorporates positional information to ultimately reveal haplotype-specific differences between accessions.
    Genome Biology and Evolution 05/2011; 3:627-40. DOI:10.1093/gbe/evr038 · 4.53 Impact Factor
  • Source
    Evutionary Biology Symposium 2009, October 2009, Perth Australia; 10/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The mammalian sex chromosomes evolved from an ordinary pair of autosomes during evolution. Unlike the X chromosome that is highly conserved, the Y chromosome is poorly conserved among mammalian lineages. Several special features set the Y chromosome apart from the rest of genome: male-limited transmission, absence of recombination, abundance of Y-specific repetitive sequences, degeneration of Y-linked genes during evolution, acquisition of autosomal genes, and accumulation and functional cluster of "testis genes" for maleness and reproduction. Since the degeneration process is lineage-dependent, different lineages retain different subsets of genes from the ancestral proto-Y chromosome, resulting in a diverse and lineage-specific Y chromosome gene content. During bovine evolution, a lineage-specific 'autosome-to-Y' transposition event resulted in three bovid-specific Y chromosome gene families, PRAMEY, ZNF280BY and ZNF280AY. Together, the male-specific region (MSY) of the bovine Y chromosome (BTAY) contains ~ 1200 protein coding genes that can be classified into 12 single copy and 16 multiple copy protein families. The copy number (CN) of these Y-linked gene families varies from 13 for PRAMEY to 236 for ZNF280BY, with significant differences between the taurine and indicine Y lineages. In addition, 367 non-coding RNA families (ncRNAs) were also identified on BTAY. Transcriptome analysis revealed that 95% of the BTAY genes/ncRNAs are expressed predominantly in testis and may be involved in spermatogenesis and male fertility. Though the functional role for the majority of the Y-linked genes needs to be determined, the preliminary data on PRAMEY clearly indicated a role in spermiogenesis. Furthermore, copy number variations (CNVs) of PRAMEY, ZNF280BY, TSPY and HSFY were found to be associated with testis size, sperm quality and fertility in dairy bulls. The authors discuss several challenges that influence male fertility selection associated with the bovine Y chromosome.
    Reproduction in Domestic Ruminants VIII, First edited by JL Juengel, A Miyamoto, C Price, LP Reynolds, MF Smith, R Webb, 01/2014: pages 239-255; Context Products Ltd., ISBN: 9781899043637
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome sequencing of closely related individuals has yielded valuable insights that link genome evolution to phenotypic variations. However, advancement in sequencing technology has also led to an escalation in the number of poor quality-drafted genomes assembled based on reference genomes that can have highly divergent or haplotypic regions. The self-fertilizing nature of Arabidopsis thaliana poses an advantage to sequencing projects because its genome is mostly homozygous. To determine the accuracy of an Arabidopsis drafted genome in less conserved regions, we performed a resequencing experiment on a ∼371-kb genomic interval in the Landsberg erecta (Ler-0) accession. We identified novel structural variations (SVs) between Ler-0 and the reference accession Col-0 using a long-range polymerase chain reaction approach to generate an Illumina data set that has positional information, that is, a data set with reads that map to a known location. Positional information is important for accurate genome assembly and the resolution of SVs particularly in highly duplicated or repetitive regions. Sixty-one regions with misassembly signatures were identified from the Ler-0 draft, suggesting the presence of novel SVs that are not represented in the draft sequence. Sixty of those were resolved by iterative mapping using our data set. Fifteen large indels (>100 bp) identified from this study were found to be located either within protein-coding regions or upstream regulatory regions, suggesting the formation of novel alleles or altered regulation of existing genes in Ler-0. We propose future genome-sequencing experiments to follow a clone-based approach that incorporates positional information to ultimately reveal haplotype-specific differences between accessions.
    Genome Biology and Evolution 05/2011; 3:627-40. DOI:10.1093/gbe/evr038 · 4.53 Impact Factor
  • Source
    Evutionary Biology Symposium 2009, October 2009, Perth Australia; 10/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The mammalian sex chromosomes evolved from an ordinary pair of autosomes during evolution. Unlike the X chromosome that is highly conserved, the Y chromosome is poorly conserved among mammalian lineages. Several special features set the Y chromosome apart from the rest of genome: male-limited transmission, absence of recombination, abundance of Y-specific repetitive sequences, degeneration of Y-linked genes during evolution, acquisition of autosomal genes, and accumulation and functional cluster of "testis genes" for maleness and reproduction. Since the degeneration process is lineage-dependent, different lineages retain different subsets of genes from the ancestral proto-Y chromosome, resulting in a diverse and lineage-specific Y chromosome gene content. During bovine evolution, a lineage-specific 'autosome-to-Y' transposition event resulted in three bovid-specific Y chromosome gene families, PRAMEY, ZNF280BY and ZNF280AY. Together, the male-specific region (MSY) of the bovine Y chromosome (BTAY) contains ~ 1200 protein coding genes that can be classified into 12 single copy and 16 multiple copy protein families. The copy number (CN) of these Y-linked gene families varies from 13 for PRAMEY to 236 for ZNF280BY, with significant differences between the taurine and indicine Y lineages. In addition, 367 non-coding RNA families (ncRNAs) were also identified on BTAY. Transcriptome analysis revealed that 95% of the BTAY genes/ncRNAs are expressed predominantly in testis and may be involved in spermatogenesis and male fertility. Though the functional role for the majority of the Y-linked genes needs to be determined, the preliminary data on PRAMEY clearly indicated a role in spermiogenesis. Furthermore, copy number variations (CNVs) of PRAMEY, ZNF280BY, TSPY and HSFY were found to be associated with testis size, sperm quality and fertility in dairy bulls. The authors discuss several challenges that influence male fertility selection associated with the bovine Y chromosome.
    Reproduction in Domestic Ruminants VIII, First edited by JL Juengel, A Miyamoto, C Price, LP Reynolds, MF Smith, R Webb, 01/2014: pages 239-255; Context Products Ltd., ISBN: 9781899043637