[Show abstract][Hide abstract] ABSTRACT: Exome sequencing in families affected by rare genetic disorders has the potential to rapidly identify new disease genes (genes in which mutations cause disease), but the identification of a single causal mutation among thousands of variants remains a significant challenge. We developed a scoring algorithm to prioritize potential causal variants within a family according to segregation with the phenotype, population frequency, predicted effect, and gene expression in the tissue(s) of interest. To narrow the search space in families with multiple affected individuals, we also developed two complementary approaches to exome-based mapping of autosomal-dominant disorders. One approach identifies segments of maximum identity by descent among affected individuals; the other nominates regions on the basis of shared rare variants and the absence of homozygous differences between affected individuals. We showcase our methods by using exome sequence data from families affected by autosomal-dominant retinitis pigmentosa (adRP), a rare disorder characterized by night blindness and progressive vision loss. We performed exome capture and sequencing on 91 samples representing 24 families affected by probable adRP but lacking common disease-causing mutations. Eight of 24 families (33%) were revealed to harbor high-scoring, most likely pathogenic (by clinical assessment) mutations affecting known RP genes. Analysis of the remaining 17 families identified candidate variants in a number of interesting genes, some of which have withstood further segregation testing in extended pedigrees. To empower the search for Mendelian-disease genes in family-based sequencing studies, we implemented them in a cross-platform-compatible software package, MendelScan, which is freely available to the research community.
The American Journal of Human Genetics 02/2014; · 11.20 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: We report the first large-scale exome-wide analysis of the combined germline-somatic landscape in ovarian cancer. Here we analyse germline and somatic alterations in 429 ovarian carcinoma cases and 557 controls. We identify 3,635 high confidence, rare truncation and 22,953 missense variants with predicted functional impact. We find germline truncation variants and large deletions across Fanconi pathway genes in 20% of cases. Enrichment of rare truncations is shown in BRCA1, BRCA2 and PALB2. In addition, we observe germline truncation variants in genes not previously associated with ovarian cancer susceptibility (NF1, MAP3K4, CDKN2B and MLL3). Evidence for loss of heterozygosity was found in 100 and 76% of cases with germline BRCA1 and BRCA2 truncations, respectively. Germline-somatic interaction analysis combined with extensive bioinformatics annotation identifies 222 candidate functional germline truncation and missense variants, including two pathogenic BRCA1 and 1 TP53 deleterious variants. Finally, integrated analyses of germline and somatic variants identify significantly altered pathways, including the Fanconi, MAPK and MLL pathways.
[Show abstract][Hide abstract] ABSTRACT: Genome-wide association studies (GWAS) have identified >500 common variants associated with quantitative metabolic traits, but in aggregate such variants explain at most 20-30% of the heritable component of population variation in these traits. To further investigate the impact of genotypic variation on metabolic traits, we conducted re-sequencing studies in >6,000 members of a Finnish population cohort (The Northern Finland Birth Cohort of 1966 [NFBC]) and a type 2 diabetes case-control sample (The Finland-United States Investigation of NIDDM Genetics [FUSION] study). By sequencing the coding sequence and 5' and 3' untranslated regions of 78 genes at 17 GWAS loci associated with one or more of six metabolic traits (serum levels of fasting HDL-C, LDL-C, total cholesterol, triglycerides, plasma glucose, and insulin), and conducting both single-variant and gene-level association tests, we obtained a more complete understanding of phenotype-genotype associations at eight of these loci. At all eight of these loci, the identification of new associations provides significant evidence for multiple genetic signals to one or more phenotypes, and at two loci, in the genes ABCA1 and CETP, we found significant gene-level evidence of association to non-synonymous variants with MAF<1%. Additionally, two potentially deleterious variants that demonstrated significant associations (rs138726309, a missense variant in G6PC2, and rs28933094, a missense variant in LIPC) were considerably more common in these Finnish samples than in European reference populations, supporting our prior hypothesis that deleterious variants could attain high frequencies in this isolated population, likely due to the effects of population bottlenecks. Our results highlight the value of large, well-phenotyped samples for rare-variant association analysis, and the challenge of evaluating the phenotypic impact of such variants.
[Show abstract][Hide abstract] ABSTRACT: The advent of the next-generation sequencing data has made it possible to cost-effectively detect and characterize genomic variation in human genomes. Structural variation, including deletion, duplication, insertion, inversion and translocation, is of great importance to human genetics due to its association with many genetic diseases. BreakDancer is a bioinformatics tool that relates paired-end read alignments from a test genome to the reference genome for the purpose of comprehensively and accurately detecting various types of structural variation.
Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] 01/2014; 2014.
[Show abstract][Hide abstract] ABSTRACT: The Drug-Gene Interaction database (DGIdb) mines existing resources that generate hypotheses about how mutated genes might be targeted therapeutically or prioritized for drug development. It provides an interface for searching lists of genes against a compendium of drug-gene interactions and potentially 'druggable' genes. DGIdb can be accessed at http://dgidb.org/.
[Show abstract][Hide abstract] ABSTRACT: Genomics is a relatively new scientific discipline, having DNA sequencing as its core technology. As technology has improved the cost and scale of genome characterization over sequencing's 40-year history, the scope of inquiry has commensurately broadened. Massively parallel sequencing has proven revolutionary, shifting the paradigm of genomics to address biological questions at a genome-wide scale. Sequencing now empowers clinical diagnostics and other aspects of medical care, including disease risk, therapeutic identification, and prenatal testing. This Review explores the current state of genomics in the massively parallel sequencing era.
[Show abstract][Hide abstract] ABSTRACT: The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile.
[Show abstract][Hide abstract] ABSTRACT: Macular degeneration is a common cause of blindness in the elderly. To identify rare coding variants associated with a large increase in risk of age-related macular degeneration (AMD), we sequenced 2,335 cases and 789 controls in 10 candidate loci (57 genes). To increase power, we augmented our control set with ancestry-matched exome-sequenced controls. An analysis of coding variation in 2,268 AMD cases and 2,268 ancestry-matched controls identified 2 large-effect rare variants: previously described p.Arg1210Cys encoded in the CFH gene (case frequency (fcase) = 0.51%; control frequency (fcontrol) = 0.02%; odds ratio (OR) = 23.11) and newly identified p.Lys155Gln encoded in the C3 gene (fcase = 1.06%; fcontrol = 0.39%; OR = 2.68). The variants suggest decreased inhibition of C3 by complement factor H, resulting in increased activation of the alternative complement pathway, as a key component of disease biology.
[Show abstract][Hide abstract] ABSTRACT: Most mutations in cancer genomes are thought to be acquired after the initiating event, which may cause genomic instability and drive clonal evolution. However, for acute myeloid leukemia (AML), normal karyotypes are common, and genomic instability is unusual. To better understand clonal evolution in AML, we sequenced the genomes of M3-AML samples with a known initiating event (PML-RARA) versus the genomes of normal karyotype M1-AML samples and the exomes of hematopoietic stem/progenitor cells (HSPCs) from healthy people. Collectively, the data suggest that most of the mutations found in AML genomes are actually random events that occurred in HSPCs before they acquired the initiating mutation; the mutational history of that cell is "captured" as the clone expands. In many cases, only one or two additional, cooperating mutations are needed to generate the malignant founding clone. Cells from the founding clone can acquire additional cooperating mutations, yielding subclones that can contribute to disease progression and/or relapse.
[Show abstract][Hide abstract] ABSTRACT: To assess the genetic consequences of induced pluripotent stem cell (iPSC) reprogramming, we sequenced the genomes of ten murine iPSC clones derived from three independent reprogramming experiments, and compared them to their parental cell genomes. We detected hundreds of single nucleotide variants (SNVs) in every clone, with an average of 11 in coding regions. In two experiments, all SNVs were unique for each clone and did not cluster in pathways, but in the third, all four iPSC clones contained 157 shared genetic variants, which could also be detected in rare cells (<1 in 500) within the parental MEF pool. These data suggest that most of the genetic variation in iPSC clones is not caused by reprogramming per se, but is rather a consequence of cloning individual cells, which "captures" their mutational history. These findings have implications for the development and therapeutic use of cells that are reprogrammed by any method.
[Show abstract][Hide abstract] ABSTRACT: The myelodysplastic syndromes are a group of hematologic disorders that often evolve into secondary acute myeloid leukemia (AML). The genetic changes that underlie progression from the myelodysplastic syndromes to secondary AML are not well understood.
We performed whole-genome sequencing of seven paired samples of skin and bone marrow in seven subjects with secondary AML to identify somatic mutations specific to secondary AML. We then genotyped a bone marrow sample obtained during the antecedent myelodysplastic-syndrome stage from each subject to determine the presence or absence of the specific somatic mutations. We identified recurrent mutations in coding genes and defined the clonal architecture of each pair of samples from the myelodysplastic-syndrome stage and the secondary-AML stage, using the allele burden of hundreds of mutations.
Approximately 85% of bone marrow cells were clonal in the myelodysplastic-syndrome and secondary-AML samples, regardless of the myeloblast count. The secondary-AML samples contained mutations in 11 recurrently mutated genes, including 4 genes that have not been previously implicated in the myelodysplastic syndromes or AML. In every case, progression to acute leukemia was defined by the persistence of an antecedent founding clone containing 182 to 660 somatic mutations and the outgrowth or emergence of at least one subclone, harboring dozens to hundreds of new mutations. All founding clones and subclones contained at least one mutation in a coding gene.
Nearly all the bone marrow cells in patients with myelodysplastic syndromes and secondary AML are clonally derived. Genetic evolution of secondary AML is a dynamic process shaped by multiple cycles of mutation acquisition and clonal selection. Recurrent gene mutations are found in both founding clones and daughter subclones. (Funded by the National Institutes of Health and others.).
New England Journal of Medicine 03/2012; 366(12):1090-8. · 54.42 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Cancer is a disease driven by genetic variation and mutation. Exome sequencing can be utilized for discovering these variants and mutations across hundreds of tumors. Here we present an analysis tool, VarScan 2, for the detection of somatic mutations and copy number alterations (CNAs) in exome data from tumor-normal pairs. Unlike most current approaches, our algorithm reads data from both samples simultaneously; a heuristic and statistical algorithm detects sequence variants and classifies them by somatic status (germline, somatic, or LOH); while a comparison of normalized read depth delineates relative copy number changes. We apply these methods to the analysis of exome sequence data from 151 high-grade ovarian tumors characterized as part of the Cancer Genome Atlas (TCGA). We validated some 7790 somatic coding mutations, achieving 93% sensitivity and 85% precision for single nucleotide variant (SNV) detection. Exome-based CNA analysis identified 29 large-scale alterations and 619 focal events per tumor on average. As in our previous analysis of these data, we observed frequent amplification of oncogenes (e.g., CCNE1, MYC) and deletion of tumor suppressors (NF1, PTEN, and CDKN2A). We searched for additional recurrent focal CNAs using the correlation matrix diagonal segmentation (CMDS) algorithm, which identified 424 significant events affecting 582 genes. Taken together, our results demonstrate the robust performance of VarScan 2 for somatic mutation and CNA detection and shed new light on the landscape of genetic alterations in ovarian cancer.
Genome Research 03/2012; 22(3):568-76. · 14.40 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Motivation: The sequencing of tumors and their matched normals is frequently used to study the genetic composition of cancer. Despite this fact, there remains a dearth of available software tools designed to compare sequences in pairs of samples and identify sites that are likely to be unique to one sample. Results: In this article, we describe the mathematical basis of our SomaticSniper software for comparing tumor and normal pairs. We estimate its sensitivity and precision, and present several common sources of error resulting in miscalls. Availability and implementation: Binaries are freely available for download at http://gmt.genome.wustl.edu/somatic-sniper/current/, implemented in C and supported on Linux and Mac OS X. Contact:firstname.lastname@example.org; email@example.com Supplementary information:Supplementary data are available at Bioinformatics online.
[Show abstract][Hide abstract] ABSTRACT: Most patients with acute myeloid leukaemia (AML) die from progressive disease after relapse, which is associated with clonal evolution at the cytogenetic level. To determine the mutational spectrum associated with relapse, we sequenced the primary tumour and relapse genomes from eight AML patients, and validated hundreds of somatic mutations using deep sequencing; this allowed us to define clonality and clonal evolution patterns precisely at relapse. In addition to discovering novel, recurrently mutated genes (for example, WAC, SMC3, DIS3, DDX41 and DAXX) in AML, we also found two major clonal evolution patterns during AML relapse: (1) the founding clone in the primary tumour gained mutations and evolved into the relapse clone, or (2) a subclone of the founding clone survived initial therapy, gained additional mutations and expanded at relapse. In all cases, chemotherapy failed to eradicate the founding clone. The comparison of relapse-specific versus primary tumour mutations in all eight cases revealed an increase in transversions, probably due to DNA damage caused by cytotoxic chemotherapy. These data demonstrate that AML relapse is associated with the addition of new mutations and clonal evolution, which is shaped, in part, by the chemotherapy that the patients receive to establish and maintain remissions.
[Show abstract][Hide abstract] ABSTRACT: The emergence of next-generation sequencing (NGS) technologies offers an incredible opportunity to comprehensively study DNA sequence variation in human genomes. Commercially available platforms from Roche (454), Illumina (Genome Analyzer and Hiseq 2000), and Applied Biosystems (SOLiD) have the capability to completely sequence individual genomes to high levels of coverage. NGS data is particularly advantageous for the study of structural variation (SV) because it offers the sensitivity to detect variants of various sizes and types, as well as the precision to characterize their breakpoints at base pair resolution. In this chapter, we present methods and software algorithms that have been developed to detect SVs and copy number changes using massively parallel sequencing data. We describe visualization and de novo assembly strategies for characterizing SV breakpoints and removing false positives.
[Show abstract][Hide abstract] ABSTRACT: Myelodysplastic syndromes (MDS) are hematopoietic stem cell disorders that often progress to chemotherapy-resistant secondary acute myeloid leukemia (sAML). We used whole-genome sequencing to perform an unbiased comprehensive screen to discover the somatic mutations in a sample from an individual with sAML and genotyped the loci containing these mutations in the matched MDS sample. Here we show that a missense mutation affecting the serine at codon 34 (Ser34) in U2AF1 was recurrently present in 13 out of 150 (8.7%) subjects with de novo MDS, and we found suggestive evidence of an increased risk of progression to sAML associated with this mutation. U2AF1 is a U2 auxiliary factor protein that recognizes the AG splice acceptor dinucleotide at the 3' end of introns, and the alterations in U2AF1 are located in highly conserved zinc fingers of this protein. Mutant U2AF1 promotes enhanced splicing and exon skipping in reporter assays in vitro. This previously unidentified, recurrent mutation in U2AF1 implicates altered pre-mRNA splicing as a potential mechanism for MDS pathogenesis.
[Show abstract][Hide abstract] ABSTRACT: The sequencing of tumors and their matched normals is frequently used to study the genetic composition of cancer. Despite this fact, there remains a dearth of available software tools designed to compare sequences in pairs of samples and identify sites that are likely to be unique to one sample.
In this article, we describe the mathematical basis of our SomaticSniper software for comparing tumor and normal pairs. We estimate its sensitivity and precision, and present several common sources of error resulting in miscalls.
Binaries are freely available for download at http://gmt.genome.wustl.edu/somatic-sniper/current/, implemented in C and supported on Linux and Mac OS X.
Supplementary data are available at Bioinformatics online.
[Show abstract][Hide abstract] ABSTRACT: Whole-genome sequencing is becoming increasingly available for research purposes, but it has not yet been routinely used for clinical diagnosis.
To determine whether whole-genome sequencing can identify cryptic, actionable mutations in a clinically relevant time frame. DESIGN, SETTING, AND PATIENT: We were referred a difficult diagnostic case of acute promyelocytic leukemia with no pathogenic X-RARA fusion identified by routine metaphase cytogenetics or interphase fluorescence in situ hybridization (FISH). The case patient was enrolled in an institutional review board-approved protocol, with consent specifically tailored to the implications of whole-genome sequencing. The protocol uses a "movable firewall" that maintains patient anonymity within the entire research team but allows the research team to communicate medically relevant information to the treating physician.
Clinical relevance of whole-genome sequencing and time to communicate validated results to the treating physician.
Massively parallel paired-end sequencing allowed identification of a cytogenetically cryptic event: a 77-kilobase segment from chromosome 15 was inserted en bloc into the second intron of the RARA gene on chromosome 17, resulting in a classic bcr3 PML-RARA fusion gene. Reverse transcription polymerase chain reaction sequencing subsequently validated the expression of the fusion transcript. Novel FISH probes identified 2 additional cases of t(15;17)-negative acute promyelocytic leukemia that had cytogenetically invisible insertions. Whole-genome sequencing and validation were completed in 7 weeks and changed the treatment plan for the patient.
Whole-genome sequencing can identify cytogenetically invisible oncogenes in a clinically relevant time frame.
JAMA The Journal of the American Medical Association 04/2011; 305(15):1577-84. · 29.98 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Acute promyelocytic leukemia (APL) is a subtype of acute myeloid leukemia (AML). It is characterized by the t(15;17)(q22;q11.2) chromosomal translocation that creates the promyelocytic leukemia-retinoic acid receptor α (PML-RARA) fusion oncogene. Although this fusion oncogene is known to initiate APL in mice, other cooperating mutations, as yet ill defined, are important for disease pathogenesis. To identify these, we used a mouse model of APL, whereby PML-RARA expressed in myeloid cells leads to a myeloproliferative disease that ultimately evolves into APL. Sequencing of a mouse APL genome revealed 3 somatic, nonsynonymous mutations relevant to APL pathogenesis, of which 1 (Jak1 V657F) was found to be recurrent in other affected mice. This mutation was identical to the JAK1 V658F mutation previously found in human APL and acute lymphoblastic leukemia samples. Further analysis showed that JAK1 V658F cooperated in vivo with PML-RARA, causing a rapidly fatal leukemia in mice. We also discovered a somatic 150-kb deletion involving the lysine (K)-specific demethylase 6A (Kdm6a, also known as Utx) gene, in the mouse APL genome. Similar deletions were observed in 3 out of 14 additional mouse APL samples and 1 out of 150 human AML samples. In conclusion, whole genome sequencing of mouse cancer genomes can provide an unbiased and comprehensive approach for discovering functionally relevant mutations that are also present in human leukemias.
The Journal of clinical investigation 03/2011; 121(4):1445-55. · 15.39 Impact Factor