Genome-wide mapping and assembly of structural variant breakpoints in the mouse genome

Department of Biochemistry and Molecular Genetics, University of Virginia School of Medicine, Charlottesville, Virginia 22908, USA.
Genome Research (Impact Factor: 14.63). 03/2010; 20(5):623-35. DOI: 10.1101/gr.102970.109
Source: PubMed


Structural variation (SV) is a rich source of genetic diversity in mammals, but due to the challenges associated with mapping SV in complex genomes, basic questions regarding their genomic distribution and mechanistic origins remain unanswered. We have developed an algorithm (HYDRA) to localize SV breakpoints by paired-end mapping, and a general approach for the genome-wide assembly and interpretation of breakpoint sequences. We applied these methods to two inbred mouse strains: C57BL/6J and DBA/2J. We demonstrate that HYDRA accurately maps diverse classes of SV, including those involving repetitive elements such as transposons and segmental duplications; however, our analysis of the C57BL/6J reference strain shows that incomplete reference genome assemblies are a major source of noise. We report 7196 SVs between the two strains, more than two-thirds of which are due to transposon insertions. Of the remainder, 59% are deletions (relative to the reference), 26% are insertions of unlinked DNA, 9% are tandem duplications, and 6% are inversions. To investigate the origins of SV, we characterized 3316 breakpoint sequences at single-nucleotide resolution. We find that approximately 16% of non-transposon SVs have complex breakpoint patterns consistent with template switching during DNA replication or repair, and that this process appears to preferentially generate certain classes of complex variants. Moreover, we find that SVs are significantly enriched in regions of segmental duplication, but that this effect is largely independent of DNA sequence homology and thus cannot be explained by non-allelic homologous recombination (NAHR) alone. This result suggests that the genetic instability of such regions is often the cause rather than the consequence of duplicated genomic architecture.

Download full-text


Available from: Joshua Chang Mell, Apr 17, 2014
  • Source
    • "For probes which align equally-well to multiple positions, a position was chosen at random. Markers whose probe sequence did not align to the reference genome were assigned a missing value for chromosome and a position of 0. Markers coincident with known SNPs from the Sanger Mouse Genomes Project were identified using bedtools intersect v2.22.1 (Quinlan and Hall 2010) and annotated with an rsID if available. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Genotyping microarrays are an important resource for genetic mapping, population genetics and monitoring of the genetic integrity of laboratory stocks. We have developed the third generation of the Mouse Universal Genotyping Array (MUGA) series, GigaMUGA, a 143,259-probe Illumina Infinium II array for the house mouse (Mus musculus). The bulk of the content of GigaMUGA is optimized for genetic mapping in the Collaborative Cross and Diversity Outbred populations and for substrain-level identification of laboratory mice. In addition to 141,090 SNP probes, GigaMUGA contains 2,006 probes for copy number concentrated in structurally polymorphic regions of the mouse genome. The performance of the array is characterized in a set of 500 high-quality reference samples spanning laboratory inbred strains, recombinant inbred lines, outbred stocks, and wild-caught mice. GigaMUGA is highly informative across a wide range of genetically-diverse samples, from laboratory substrains to other Mus species. In addition to describing the content and performance of the array, we provide detailed probe-level annotation and recommendations for quality control.
    Full-text · Article · Dec 2015 · G3-Genes Genomes Genetics
  • Source
    • "We attempted to confirm the breakpoints of the rescuing deletions by short-read sequencing. We identified the breakpoints of Df(3L)BSC27 using both Hydra and Delly, which utilize paired-end read analysis (Quinlan et al. 2010; Rausch et al. 2012). The breakpoints were identified to be within 200 bp of those reported previously (Table 3). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Hybrid sons between Drosophila melanogaster females and Drosophila simulans males die as 3(rd) instar larvae. Two genes, D. melanogaster Hybrid male rescue (Hmr) on the X chromosome, and D. simulans Lethal hybrid rescue (Lhr) on chromosome II, interact to cause this lethality. Loss-of-function mutations in either gene suppress lethality, but several pieces of evidence suggest that additional factors are required for hybrid lethality. Here we screen the D. melanogaster autosomal genome using the Bloomington Stock Center Deficiency kit to search for additional regions that can rescue hybrid male lethality. Our screen is designed to identify putative hybrid incompatibility (HI) genes similar to Hmr and Lhr which, when removed, are dominant suppressors of lethality. After screening 89% of the autosomal genome, we found no regions that rescue males to the adult stage. We did, though, identify several regions which rescue up to 13% of males to the pharate adult stage. This weak rescue suggests the presence of multiple minor-effect HI loci, but we were unable to map these loci to high resolution, presumably because weak rescue can be masked by genetic background effects. We attempted to test one candidate, the dosage compensation gene male specific lethal-3 (msl-3), using RNA interference with shmiR constructs targeted specifically against D. simulans msl-3 but failed to achieve knockdown, in part due to off-target effects. We conclude that the D. melanogaster autosomal genome likely does not contain additional major-effect HI loci. We also show that Hmr is insufficient to fully account for the lethality associated with the D. melanogaster X chromosome, suggesting that additional X-linked genes contribute to hybrid lethality.
    Full-text · Article · Oct 2014 · G3-Genes Genomes Genetics
  • Source
    • "A general step in SV identification is to cluster discordant reads into clusters [3] [4] [10]. The determination of discordant read is based on insert size distribution and alignment orientation between paired reads. "
    [Show abstract] [Hide abstract]
    ABSTRACT: High coverage whole genome DNA-sequencing enables identification of somatic structural variation (SSV) more evident in paired tumor and normal samples. Recent studies show that simultaneous analysis of paired samples provides a better resolution of SSV detection than subtracting shared SVs. However, available tools can neither identify all types of SSVs nor provide any rank information regarding their somatic features. In this paper, we have developed a Bayesian framework, by integrating read alignment information from both tumor and normal samples, called BSSV, to calculate the significance of each SSV. Tested by simulated data, the precision of BSSV is comparable to that of available tools and the false negative rate is significantly lowered. We have also applied this approach to The Cancer Genome Atlas breast cancer data for SSV detection. Many known breast cancer specific mutated genes like RAD51, BRIP1, ER, PGR and PTPRD have been successfully identified.
    Full-text · Article · Aug 2014
Show more