Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities

Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA.
Genome biology (Impact Factor: 10.47). 07/2011; 12(7):R68. DOI: 10.1186/gb-2011-12-7-r68
Source: PubMed

ABSTRACT Enrichment of loci by DNA hybridization-capture, followed by high-throughput sequencing, is an important tool in modern genetics. Currently, the most common targets for enrichment are the protein coding exons represented by the consensus coding DNA sequence (CCDS). The CCDS, however, excludes many actual or computationally predicted coding exons present in other databases, such as RefSeq and Vega, and non-coding functional elements such as untranslated and regulatory regions. The number of variants per base pair (variant density) and our ability to interrogate regions outside of the CCDS regions is consequently less well understood.
We examine capture sequence data from outside of the CCDS regions and find that extremes of GC content that are present in different subregions of the genome can reduce the local capture sequence coverage to less than 50% relative to the CCDS. This effect is due to biases inherent in both the Illumina and SOLiD sequencing platforms that are exacerbated by the capture process. Interestingly, for two subregion types, microRNA and predicted exons, the capture process yields higher than expected coverage when compared to whole genome sequencing. Lastly, we examine the variation present in non-CCDS regions and find that predicted exons, as well as exonic regions specific to RefSeq and Vega, show much higher variant densities than the CCDS.
We show that regions outside of the CCDS perform less efficiently in capture sequence experiments. Further, we show that the variant density in computationally predicted exons is more than 2.5-times higher than that observed in the CCDS.

Download full-text


Available from: Irene Newsham, Aug 27, 2015
1 Follower
  • Source
    • "A peripheral blood sample was submitted to the Baylor College of Medicine clinical exome sequencing service. This analysis was performed based on standard procedures within that laboratory, based on published methods [Bainbridge et al., 2011] and information at the Baylor Human Genome Sequencing Center: https:// Paired-End_Capture_Library_Preparation.pdf and analyzed by their clinical testing pipeline: "
    [Show abstract] [Hide abstract]
    ABSTRACT: The TARP syndrome (Talipes equinovarus, Atrial septal defect, Robin sequence, and Persistent left superior vena cava) is an X-linked disorder that was determined to be caused by mutations in RBM10 in two families, and confirmed in a subsequent case report. The first two original families were quite similar in phenotype, with uniform early lethality although a confirmatory case report showed survival into childhood. Here we report on five affecteds from three newly recognized families, including patients with atypical manifestations. None of the five patients had talipes and others also lacked cardinal TARP features of Robin sequence and atrial septal defect. All three families demonstrated de novo mutations, and one of the families had two recurrences, with demonstrable maternal mosaicism. © 2013 Wiley Periodicals, Inc.
    American Journal of Medical Genetics Part A 01/2014; 164(1). DOI:10.1002/ajmg.a.36212 · 2.05 Impact Factor
  • Source
    • "These results are close to the reports that the expected Ti/Tv ratio is around 3.0 for variants inside exons and about 2.0 elsewhere (Bainbridge et al., 2011). The median quality score of variants inside the target regions is 875.4, "
    [Show abstract] [Hide abstract]
    ABSTRACT: Recent advances in next-generation sequencing technologies have transformed the genetics study of human diseases; this is an era of unprecedented productivity. Exome sequencing, the targeted sequencing of the protein-coding portion of the human genome, has been shown to be a powerful and cost-effective method for detection of disease variants underlying Mendelian disorders. Increasing effort has been made in the interest of the identification of rare variants associated with complex traits in sequencing studies. Here we provided an overview of the application fields for exome sequencing in human diseases. We describe a general framework of computation and bioinformatics for handling sequencing data. We then demonstrate data quality and agreement between exome sequencing and exome microarray (chip) genotypes using data collected on the same set of subjects in a genetic study of panic disorder. Our results show that, in sequencing data, the data quality was generally higher for variants within the exonic target regions, compared to that outside the target regions, due to the target enrichment. We also compared genotype concordance for variant calls obtained by exome sequencing vs. exome genotyping microarrays. The overall consistency rate was >99.83% and the heterozygous consistency rate was >97.55%. The two platforms share a large amount of agreement over low frequency variants in the exonic regions, while exome sequencing provides much more information on variants not included on exome genotyping microarrays. The results demonstrate that exome sequencing data are of high quality and can be used to investigate the role of rare coding variants in human diseases.
    Frontiers in Genetics 08/2013; 4:160. DOI:10.3389/fgene.2013.00160
  • Source
    • "The study was approved by the Institutional Review Board at Baylor College of Medicine and was conducted in accordance with the Helsinki declaration. Five mg of DNA was sheared and hybridized to a custom probe set (VCR) as previously described [10]. The DNA was eluted and amplified for 7 PCR cycles prior to sequencing [11] "
    [Show abstract] [Hide abstract]
    ABSTRACT: Corticobasal degeneration (CBD) is a neurodegenerative, sporadic disorder of unknown cause. Few familial cases have been described. We aim to characterize the clinical, imaging, pathological and genetic features of two familial cases of CBD. We describe two first cousins with CBD associated with atypical MRI findings. We performed exome sequencing in both subjects and in an unaffected first cousin of similar age. The cases include a 79-year-old woman and a 72-year-old man of Native American and British origin. The onset of the neurological manifestations was 74 and 68 years respectively. Both patients presented with a combination of asymmetric parkinsonism, apraxia, myoclonic tremor, cortical sensory syndrome, and gait disturbance. The female subject developed left side fixed dystonia. The manifestations were unresponsive to high doses of levodopa in both cases. Extensive bilateral T1-W hyperintensities and T2-W hypointensities in basal ganglia and thalamus were observed in the female patient; whereas these findings were more subtle in the male subject. Postmortem examination of both patients was consistent with corticobasal degeneration; the female patient had additional findings consistent with mild Alzheimer's disease. No Lewy bodies were found in either case. Exome sequencing showed mutations leading to possible structural changes in MRS2 and ZHX2 genes, which appear to have the same upstream regulator miR-4277. Corticobasal degeneration can have a familial presentation; the role of MRS2 and ZHX2 gene products in CBD should be further investigated.
    Parkinsonism & Related Disorders 07/2013; 19(11). DOI:10.1016/j.parkreldis.2013.06.016 · 4.13 Impact Factor
Show more