High-Throughput Multiplex Sequencing to Discover Copy Number Variants in Drosophila

Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
Genetics (Impact Factor: 5.96). 07/2009; 182(4):935-41. DOI: 10.1534/genetics.109.103218
Source: PubMed


Copy number variation (CNV) contributes in phenotypically relevant ways to the genetic variability of many organisms. Cost-effective genomewide methods for identifying copy number variation are necessary to elucidate the contribution that these structural variants make to the genomes of model organisms. We have developed a novel approach for the identification of copy number variation by next generation sequencing. As a proof of concept our method has been applied to map the deletions of three Drosophila deficiency strains. We demonstrate that low sequence coverage is sufficient for identifying and mapping large deletions at kilobase resolution, suggesting that data generated from high-throughput sequencing experiments are sufficient for simultaneously analyzing many strains. Genomic DNA from two Drosophila deficiency stocks was barcoded and sequenced in multiplex, and the breakpoints associated with each deletion were successfully identified. The approach we describe is immediately applicable to the systematic exploration of copy number variation in model organisms and humans.

Download full-text


Available from: Rui Chen, Oct 20, 2014
1 Follower
22 Reads
  • Source
    • "In the case of longer reads, the exact breakpoints of all variant classes may be detected when the reads map discontinuously to the reference genome (split read method). The NGS approach has been proven effective for the discovery and mapping of structural variants at nucleotide-resolution in plants, animals and humans (Daines et al. 2009; Yoon et al. 2009; Mills et al. 2011; Cao et al. 2011; Bickhart et al. 2012). The main drawbacks of NGS are the following: difficulty with mapping short reads to DNA repeats (Treangen and Salzberg 2011) and platform-specific biases, which result in lower read coverage of some parts of the genome (for example, GC-rich regions) (Dohm et al. 2008). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Copy number variants (CNVs) are genomic rearrangements resulting from gains or losses of DNA segments. Typically, the term refers to rearrangements of sequences larger than 1 kb. This type of polymorphism has recently been shown to be a key contributor to intra-species genetic variation, along with single-nucleotide polymorphisms and short insertion-deletion polymorphisms. Over the last decade, a growing number of studies have highlighted the importance of copy number variation (CNV) as a factor affecting human phenotype and individual CNVs have been linked to risks for severe diseases. In plants, the exploration of the extent and role of CNV is still just beginning. Initial genomic analyses indicate that CNVs are prevalent in plants and have greatly affected plant genome evolution. Many CNV events have been observed in outcrossing and autogamous species. CNVs are usually found on all chromosomes, with CNV hotspots interspersed with regions of very low genetic variation. Although CNV is mainly associated with intergenic regions, many CNVs encompass protein-coding genes. The collected data suggest that CNV mainly affects the members of large families of functionally redundant genes. Thus, the effects of individual CNV events on phenotype are usually modest. Nevertheless, there are many cases in which CNVs for specific genes have been linked to important traits such as flowering time, plant height and resistance to biotic and abiotic stress. Recent reports suggest that CNVs may form rapidly in response to stress.
    Theoretical and Applied Genetics 08/2013; 127(1). DOI:10.1007/s00122-013-2177-7 · 3.79 Impact Factor
  • Source
    • "Owing to rapidly declining costs, second-generation sequencing has become an affordable means to perform surveys of sequence variation on a genome-wide scale. Several complete genome sequences have already been obtained for humans (and recently also for Neanderthals) (Bentley et al. 2008; Wheeler et al. 2008; Abdulla et al. 2009; Ahn et al. 2009; McKernan et al. 2009; Pushkarev et al. 2009; Green et al. 2010; Schuster et al. 2010) and for model organisms (Doniger et al. 2008; Ossowski et al. 2008; Daines et al. 2009; Hillier et al. 2009). In addition, de novo whole-genome sequencing has become feasible for nonmodel organism species, thus extending the power of genomic approaches to species important for ecological studies or for conservation biology. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Second-generation sequencing technologies allow surveys of sequence variation on an unprecedented scale. However, despite the rapid decrease in sequencing costs, collecting whole-genome sequence data on a population scale is still prohibitive for many laboratories. We have implemented an inexpensive, reduced representation protocol for preparing resequencing targets, and we have developed the analytical tools necessary for making population genetic inferences. This approach can be applied to any species for which a draft or complete reference genome sequence is available. The new tools we have developed include methods for aligning reads, calling genotypes, and incorporating sample-specific sequencing error rates in the estimate of evolutionary parameters. When applied to 19 individuals from a total of 18 human populations, our approach allowed sampling regions that are largely overlapping across individuals and that are representative of the entire genome. The resequencing data were used to test the serial founder model of human dispersal and to estimate the time of the Out of Africa migration. Our results also represent the first attempt to provide a time frame for the colonization of Australia based on large-scale resequencing data.
    Genome Research 05/2011; 21(7):1087-98. DOI:10.1101/gr.119792.110 · 14.63 Impact Factor
  • Source
    • "Structural variation within the genome, including insertions, duplications, deletions, and inversions of up to multiple kilobase pairs, have recently been described in a variety of species, including humans [1-3], mice [4], rats [5], silkworms [6] drosophila [7], and dogs [8]. These genomic variations were recently found to be widespread, encompassing 5% of the human genome [9], and are thought to be involved in (co)determining complex phenotypes [10,11]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Variation within individual genomes ranges from single nucleotide polymorphisms (SNPs) to kilobase, and even megabase, sized structural variants (SVs), such as deletions, insertions, inversions, and more complex rearrangements. Although much is known about the extent of SVs in humans and mice, species in which they exert significant effects on phenotypes, very little is known about the extent of SVs in the 2.5-times smaller and less repetitive genome of the chicken. We identified hundreds of shared and divergent SVs in four commercial chicken lines relative to the reference chicken genome. The majority of SVs were found in intronic and intergenic regions, and we also found SVs in the coding regions. To identify the SVs, we combined high-throughput short read paired-end sequencing of genomic reduced representation libraries (RRLs) of pooled samples from 25 individuals and computational mapping of DNA sequences from a reference genome. We provide a first glimpse of the high abundance of small structural genomic variations in the chicken. Extrapolating our results, we estimate that there are thousands of rearrangements in the chicken genome, the majority of which are located in non-coding regions. We observed that structural variation contributes to genetic differentiation among current domesticated chicken breeds and the Red Jungle Fowl. We expect that, because of their high abundance, SVs might explain phenotypic differences and play a role in the evolution of the chicken genome. Finally, our study exemplifies an efficient and cost-effective approach for identifying structural variation in sequenced genomes.
    BMC Genomics 02/2011; 12(1):94. DOI:10.1186/1471-2164-12-94 · 3.99 Impact Factor
Show more