[Show abstract][Hide abstract] ABSTRACT: Large-scale cancer sequencing data enable discovery of rare germline cancer susceptibility variants. Here we systematically analyse 4,034 cases from The Cancer Genome Atlas cancer cases representing 12 cancer types. We find that the frequency of rare germline truncations in 114 cancer-susceptibility-associated genes varies widely, from 4% (acute myeloid leukaemia (AML)) to 19% (ovarian cancer), with a notably high frequency of 11% in stomach cancer. Burden testing identifies 13 cancer genes with significant enrichment of rare truncations, some associated with specific cancers (for example, RAD51C, PALB2 and MSH6 in AML, stomach and endometrial cancers, respectively). Significant, tumour-specific loss of heterozygosity occurs in nine genes (ATM, BAP1, BRCA1/2, BRIP1, FANCM, PALB2 and RAD51C/D). Moreover, our homology-directed repair assay of 68 BRCA1 rare missense variants supports the utility of allelic enrichment analysis for characterizing variants of unknown significance. The scale of this analysis and the somatic-germline integration enable the detection of rare variants that may affect individual susceptibility to tumour development, a critical step toward precision medicine.
Full-text · Article · Dec 2015 · Nature Communications
[Show abstract][Hide abstract] ABSTRACT: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.
[Show abstract][Hide abstract] ABSTRACT: Summary Tumors are typically sequenced to depths of 75x-100x (exome) or 30x-50x (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid, or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ∼312x) whole genome sequencing and exome capture (up to ∼433x) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ∼200,000 putative SNVs by sequencing them to depths of ∼1,000x. Additional targeted sequencing provided over 10,000x coverage and ddPCR assays provided up to ∼250,000x sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP: phs000159).
[Show abstract][Hide abstract] ABSTRACT: Tests that predict outcomes for patients with acute myeloid leukemia (AML) are imprecise, especially for those with intermediate risk AML.
To determine whether genomic approaches can provide novel prognostic information for adult patients with de novo AML.
Whole-genome or exome sequencing was performed on samples obtained at disease presentation from 71 patients with AML (mean age, 50.8 years) treated with standard induction chemotherapy at a single site starting in March 2002, with follow-up through January 2015. In addition, deep digital sequencing was performed on paired diagnosis and remission samples from 50 patients (including 32 with intermediate-risk AML), approximately 30 days after successful induction therapy. Twenty-five of the 50 were from the cohort of 71 patients, and 25 were new, additional cases.
Whole-genome or exome sequencing and targeted deep sequencing. Risk of identification based on genetic data.
Mutation patterns (including clearance of leukemia-associated variants after chemotherapy) and their association with event-free survival and overall survival.
Analysis of comprehensive genomic data from the 71 patients did not improve outcome assessment over current standard-of-care metrics. In an analysis of 50 patients with both presentation and documented remission samples, 24 (48%) had persistent leukemia-associated mutations in at least 5% of bone marrow cells at remission. The 24 with persistent mutations had significantly reduced event-free survival vs the 26 who cleared all mutations (median [95% CI]: 6.0 months [95% CI, 3.7-9.6] for persistent mutations vs 17.9 months [95% CI, 11.3-40.4] for cleared mutations, log-rank P < .001; hazard ratio [HR], 3.67 [95% CI, 1.93-7.11], P < .001) and reduced overall survival (median [95% CI]: 10.5 months [95% CI, 7.5-22.2] for persistent mutations vs 42.2 months [95% CI, 20.6-not estimable] for cleared mutations, log-rank P = .003; HR, 2.86 [95% CI, 1.39-5.88], P = .004). Among the 32 patients with intermediate cytogenetic risk, the 14 patients with persistent mutations had reduced event-free survival compared with the 18 patients who cleared all mutations (median [95% CI]: 8.8 months [95% CI, 3.7-14.6] for persistent mutations vs 25.6 months [95% CI, 11.4-not estimable] for cleared mutations, log-rank P = .003; HR, 3.32 [95% CI, 1.44-7.67], P = .005) and reduced overall survival (median [95% CI]: 19.3 months [95% CI, 7.5-42.3] for persistent mutations vs 46.8 months [95% CI, 22.6-not estimable] for cleared mutations, log-rank P = .02; HR, 2.88 [95% CI, 1.11-7.45], P = .03).
The detection of persistent leukemia-associated mutations in at least 5% of bone marrow cells in day 30 remission samples was associated with a significantly increased risk of relapse, and reduced overall survival. These data suggest that this genomic approach may improve risk stratification for patients with AML.
No preview · Article · Aug 2015 · JAMA The Journal of the American Medical Association
[Show abstract][Hide abstract] ABSTRACT: In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.
[Show abstract][Hide abstract] ABSTRACT: Despite the success of genome-wide association studies (GWAS) in detecting a large number of loci for complex phenotypes such as rheumatoid arthritis (RA) susceptibility, the lack of information on the causal genes leaves important challenges to interpret GWAS results in the context of the disease biology. Here, we genetically fine-map the RA risk locus at 19p13 to define causal variants, and explore the pleiotropic effects of these same variants in other complex traits. First, we combined Immunochip dense genotyping (n = 23,092 case/control samples), Exomechip genotyping (n = 18,409 case/control samples) and targeted exon-sequencing (n = 2,236 case/controls samples) to demonstrate that three pro-tein-coding variants in TYK2 (tyrosine kinase 2) independently protect against RA: P1104A (rs34536443, OR = 0.66, P = 2.3×10-21), A928V (rs35018800, OR = 0.53, P = 1.2×10-9), and I684S (rs12720356, OR = 0.86, P = 4.6×10-7). Second, we show that the same three TYK2 variants protect against systemic lupus erythematosus (SLE, Pomnibus = 6×10-18 ), and provide suggestive evidence that two of the TYK2 variants (P1104A and A928V) may also protect against inflammatory bowel disease (IBD; Pomnibus = 0.005). Finally, in a phenomewide association study (PheWAS) assessing >500 phenotypes using electronic medical records (EMR) in >29,000 subjects, we found no convincing evidence for association of P1104A and A928V with complex phenotypes other than autoimmune diseases such as RA, SLE and IBD. Together, our results demonstrate the role of TYK2 in the pathogenesis of RA, SLE and IBD, and provide supporting evidence for TYK2 as a promising drug target for the treatment of autoimmune diseases.
[Show abstract][Hide abstract] ABSTRACT: The advent of the next-generation sequencing data has made it possible to cost-effectively detect and characterize genomic variation in human genomes. Structural variation, including deletion, duplication, insertion, inversion and translocation, is of great importance to human genetics due to its association with many genetic diseases. BreakDancer is a bioinformatics tool that relates paired-end read alignments from a test genome to the reference genome for the purpose of comprehensively and accurately detecting various types of structural variation.
No preview · Article · Aug 2014 · Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.]
[Show abstract][Hide abstract] ABSTRACT: Recent genomic analyses of pathologically defined tumor types identify ''within-a-tissue'' disease sub-types. However, the extent to which genomic sig-natures are shared across tissues is still unclear. We performed an integrative analysis using five genome-wide platforms and one proteomic platform on 3,527 specimens from 12 cancer types, revealing a unified classification into 11 major subtypes. Five subtypes were nearly identical to their tissue-of-origin counterparts, but several distinct cancer types were found to converge into common subtypes. Lung squamous, head and neck, and a subset of bladder cancers coalesced into one subtype typified by TP53 alterations, TP63 amplifications, and high expression of immune and proliferation pathway genes. Of note, bladder cancers split into three pan-cancer subtypes. The multiplatform classification, while correlated with tissue-of-origin, provides inde-pendent information for predicting clinical outcomes. All data sets are available for data-mining from a uni-fied resource to support further biological discov-eries and insights into novel therapeutic strategies. INTRODUCTION