Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome

Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, Virginia 22908, USA.
Genome Research (Impact Factor: 14.63). 03/2010; 20(5):623-35. DOI: 10.1101/gr.102970.109
Source: PubMed


Structural variation (SV) is a rich source of genetic diversity in mammals, but due to the challenges associated with mapping SV in complex genomes, basic questions regarding their genomic distribution and mechanistic origins remain unanswered. We have developed an algorithm (HYDRA) to localize SV breakpoints by paired-end mapping, and a general approach for the genome-wide assembly and interpretation of breakpoint sequences. We applied these methods to two inbred mouse strains: C57BL/6J and DBA/2J. We demonstrate that HYDRA accurately maps diverse classes of SV, including those involving repetitive elements such as transposons and segmental duplications; however, our analysis of the C57BL/6J reference strain shows that incomplete reference genome assemblies are a major source of noise. We report 7196 SVs between the two strains, more than two-thirds of which are due to transposon insertions. Of the remainder, 59% are deletions (relative to the reference), 26% are insertions of unlinked DNA, 9% are tandem duplications, and 6% are inversions. To investigate the origins of SV, we characterized 3316 breakpoint sequences at single-nucleotide resolution. We find that approximately 16% of non-transposon SVs have complex breakpoint patterns consistent with template switching during DNA replication or repair, and that this process appears to preferentially generate certain classes of complex variants. Moreover, we find that SVs are significantly enriched in regions of segmental duplication, but that this effect is largely independent of DNA sequence homology and thus cannot be explained by non-allelic homologous recombination (NAHR) alone. This result suggests that the genetic instability of such regions is often the cause rather than the consequence of duplicated genomic architecture.

Download full-text


Available from: Joshua Chang Mell, Apr 17, 2014
  • Source
    • "We attempted to confirm the breakpoints of the rescuing deletions by short-read sequencing. We identified the breakpoints of Df(3L)BSC27 using both Hydra and Delly, which utilize paired-end read analysis (Quinlan et al. 2010; Rausch et al. 2012). The breakpoints were identified to be within 200 bp of those reported previously (Table 3). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Hybrid sons between Drosophila melanogaster females and Drosophila simulans males die as 3(rd) instar larvae. Two genes, D. melanogaster Hybrid male rescue (Hmr) on the X chromosome, and D. simulans Lethal hybrid rescue (Lhr) on chromosome II, interact to cause this lethality. Loss-of-function mutations in either gene suppress lethality, but several pieces of evidence suggest that additional factors are required for hybrid lethality. Here we screen the D. melanogaster autosomal genome using the Bloomington Stock Center Deficiency kit to search for additional regions that can rescue hybrid male lethality. Our screen is designed to identify putative hybrid incompatibility (HI) genes similar to Hmr and Lhr which, when removed, are dominant suppressors of lethality. After screening 89% of the autosomal genome, we found no regions that rescue males to the adult stage. We did, though, identify several regions which rescue up to 13% of males to the pharate adult stage. This weak rescue suggests the presence of multiple minor-effect HI loci, but we were unable to map these loci to high resolution, presumably because weak rescue can be masked by genetic background effects. We attempted to test one candidate, the dosage compensation gene male specific lethal-3 (msl-3), using RNA interference with shmiR constructs targeted specifically against D. simulans msl-3 but failed to achieve knockdown, in part due to off-target effects. We conclude that the D. melanogaster autosomal genome likely does not contain additional major-effect HI loci. We also show that Hmr is insufficient to fully account for the lethality associated with the D. melanogaster X chromosome, suggesting that additional X-linked genes contribute to hybrid lethality.
    G3-Genes Genomes Genetics 10/2014; 4(12). DOI:10.1534/g3.114.014076 · 3.20 Impact Factor
  • Source
    • "A general step in SV identification is to cluster discordant reads into clusters [3] [4] [10]. The determination of discordant read is based on insert size distribution and alignment orientation between paired reads. "
    [Show abstract] [Hide abstract]
    ABSTRACT: High coverage whole genome DNA-sequencing enables identification of somatic structural variation (SSV) more evident in paired tumor and normal samples. Recent studies show that simultaneous analysis of paired samples provides a better resolution of SSV detection than subtracting shared SVs. However, available tools can neither identify all types of SSVs nor provide any rank information regarding their somatic features. In this paper, we have developed a Bayesian framework, by integrating read alignment information from both tumor and normal samples, called BSSV, to calculate the significance of each SSV. Tested by simulated data, the precision of BSSV is comparable to that of available tools and the false negative rate is significantly lowered. We have also applied this approach to The Cancer Genome Atlas breast cancer data for SSV detection. Many known breast cancer specific mutated genes like RAD51, BRIP1, ER, PGR and PTPRD have been successfully identified.
  • Source
    • "Maps repetitive elements such as transposons and SD hydra-sv Quinlan et al., 2010 inGAP-sv Scheme that uses abnormally mapped read pairs. Possible to distinguish HOM and HET variants. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Structural variation is variation in structure of DNA regions affecting DNA sequence length and/or orientation. It generally includes deletions, insertions, copy-number gains, inversions, and transposable elements. Traditionally, the identification of structural variation in genomes has been challenging. However, with the recent advances in high-throughput DNA sequencing and paired-end mapping (PEM) methods, the ability to identify structural variation and their respective association to human diseases has improved considerably. In this review, we describe our current knowledge of structural variation in the mouse, one of the prime model systems for studying human diseases and mammalian biology. We further present the evolutionary implications of structural variation on transposable elements. We conclude with future directions on the study of structural variation in mouse genomes that will increase our understanding of molecular architecture and functional consequences of structural variation.
    Frontiers in Genetics 07/2014; 5:192. DOI:10.3389/fgene.2014.00192
Show more