Article

Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities

Human Genome Sequencing Center, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA.
Genome biology (Impact Factor: 10.81). 07/2011; 12(7):R68. DOI: 10.1186/gb-2011-12-7-r68
Source: PubMed

ABSTRACT

Enrichment of loci by DNA hybridization-capture, followed by high-throughput sequencing, is an important tool in modern genetics. Currently, the most common targets for enrichment are the protein coding exons represented by the consensus coding DNA sequence (CCDS). The CCDS, however, excludes many actual or computationally predicted coding exons present in other databases, such as RefSeq and Vega, and non-coding functional elements such as untranslated and regulatory regions. The number of variants per base pair (variant density) and our ability to interrogate regions outside of the CCDS regions is consequently less well understood.
We examine capture sequence data from outside of the CCDS regions and find that extremes of GC content that are present in different subregions of the genome can reduce the local capture sequence coverage to less than 50% relative to the CCDS. This effect is due to biases inherent in both the Illumina and SOLiD sequencing platforms that are exacerbated by the capture process. Interestingly, for two subregion types, microRNA and predicted exons, the capture process yields higher than expected coverage when compared to whole genome sequencing. Lastly, we examine the variation present in non-CCDS regions and find that predicted exons, as well as exonic regions specific to RefSeq and Vega, show much higher variant densities than the CCDS.
We show that regions outside of the CCDS perform less efficiently in capture sequence experiments. Further, we show that the variant density in computationally predicted exons is more than 2.5-times higher than that observed in the CCDS.

Download full-text

Full-text

Available from: Irene Newsham
  • Source
    • "Results were analyzed on Nexus (BioDiscovery). Sequencing Library preparation, whole (Bainbridge et al., 2011) and targeted exome capture , and regular and ultra-deep sequencing on HiSeq 2000 platform are detailed in Supplemental Experimental Procedures. In brief, 152 samples were whole-exome sequenced and their mutations validated with a custom design targeted exome capture. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The ampulla of Vater is a complex cellular environment from which adenocarcinomas arise to form a group of histopathologically heterogenous tumors. To evaluate the molecular features of these tumors, 98 ampullary adenocarcinomas were evaluated and compared to 44 distal bile duct and 18 duodenal adenocarcinomas. Genomic analyses revealed mutations in the WNT signaling pathway among half of the patients and in all three adenocarcinomas irrespective of their origin and histological morphology. These tumors were characterized by a high frequency of inactivating mutations of ELF3, a high rate of microsatellite instability, and common focal deletions and amplifications, suggesting common attributes in the molecular pathogenesis are at play in these tumors. The high frequency of WNT pathway activating mutation, coupled with small-molecule inhibitors of β-catenin in clinical trials, suggests future treatment decisions for these patients may be guided by genomic analysis.
    Full-text · Article · Jan 2016 · Cell Reports
  • Source
    • "A peripheral blood sample was submitted to the Baylor College of Medicine clinical exome sequencing service. This analysis was performed based on standard procedures within that laboratory, based on published methods [Bainbridge et al., 2011] and information at the Baylor Human Genome Sequencing Center: https:// hgsc.bcm.edu/sites/default/files/documents/Illumina_Barcoded_- Paired-End_Capture_Library_Preparation.pdf and analyzed by their clinical testing pipeline: https://github.com/dsexton2/Mercu- "
    [Show abstract] [Hide abstract]
    ABSTRACT: The TARP syndrome (Talipes equinovarus, Atrial septal defect, Robin sequence, and Persistent left superior vena cava) is an X-linked disorder that was determined to be caused by mutations in RBM10 in two families, and confirmed in a subsequent case report. The first two original families were quite similar in phenotype, with uniform early lethality although a confirmatory case report showed survival into childhood. Here we report on five affecteds from three newly recognized families, including patients with atypical manifestations. None of the five patients had talipes and others also lacked cardinal TARP features of Robin sequence and atrial septal defect. All three families demonstrated de novo mutations, and one of the families had two recurrences, with demonstrable maternal mosaicism. © 2013 Wiley Periodicals, Inc.
    Full-text · Article · Jan 2014 · American Journal of Medical Genetics Part A
  • Source
    • "Finally, we assessed the quality and sensitivity of SNV detection in our FFPE libraries compared to matched fresh-frozen pairs, and accounting for difference between TruSeq and ScriptSeq protocols. Transition:transversion (Ti/Tv) ratios of the RiboZeroGold ScriptSeq FFPE libraries were within the range reported from DNA sequencing studies [14,15], and highly similar to their matched RiboZeroGold ScriptSeq counterparts for known SNVs, 2.21 and 2.15 respectively. However, for novel SNVs, the Ti/Tv ratio was slightly higher in FFPE than fresh-frozen material, 2.23 and 1.81 respectively, likely a result of formalin-fixation. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Advantages of RNA-Seq over array based platforms are quantitative gene expression and discovery of expressed single nucleotide variants (eSNVs) and fusion transcripts from a single platform, but the sensitivity for each of these characteristics is unknown. We measured gene expression in a set of manually degraded RNAs, nine pairs of matched fresh-frozen, and FFPE RNA isolated from breast tumor with the hybridization based, NanoString nCounter (226 gene panel) and with whole transcriptome RNA-Seq using RiboZeroGold ScriptSeq V2 library preparation kits. We performed correlation analyses of gene expression between samples and across platforms. We then specifically assessed whole transcriptome expression of lincRNA and discovery of eSNVs and fusion transcripts in the FFPE RNA-Seq data. For gene expression in the manually degraded samples, we observed Pearson correlations of >0.94 and >0.80 with NanoString and ScriptSeq protocols, respectively. Gene expression data for matched fresh-frozen and FFPE samples yielded mean Pearson correlations of 0.874 and 0.783 for NanoString (226 genes) and ScriptSeq whole transcriptome protocols respectively, p<2x10(-16). Specifically for lincRNAs, we observed superb Pearson correlation (0.988) between matched fresh-frozen and FFPE pairs. FFPE samples across NanoString and RNA-Seq platforms gave a mean Pearson correlation of 0.838. In FFPE libraries, we detected 53.4% of high confidence SNVs and 24% of high confidence fusion transcripts. Sensitivity of fusion transcript detection was not overcome by an increase in depth of sequencing up to 3-fold (increase from ~56 to ~159 million reads). Both NanoString and ScriptSeq RNA-Seq technologies yield reliable gene expression data for degraded and FFPE material. The high degree of correlation between NanoString and RNA-Seq platforms suggests discovery based whole transcriptome studies from FFPE material will produce reliable expression data. The RiboZeroGold ScriptSeq protocol performed particularly well for lincRNA expression from FFPE libraries, but detection of eSNV and fusion transcripts was less sensitive.
    Full-text · Article · Nov 2013 · PLoS ONE
Show more