[Show abstract][Hide abstract] ABSTRACT: The genetic basis underlying the majority of hereditary pancreatic adenocarcinoma (PC) is unknown. Since DNA repair genes are widely implicated in gastrointestinal malignancies, including PC, we hypothesized that there are novel DNA repair PC susceptibility genes. As germline DNA repair gene mutations may lead to PC subtypes with selective therapeutic responses, we also hypothesized that there is an overall survival (OS) difference in mutation carriers versus non-carriers. We therefore interrogated the germline exomes of 109 high-risk PC cases for rare protein-truncating variants (PTVs) in 513 putative DNA repair genes. We identified PTVs in 41 novel genes among 36 kindred. Additional genetic evidence for causality was obtained for 17 genes, with FAN1, NEK1 and RHNO1 emerging as the strongest candidates. An OS difference was observed for carriers versus non-carriers of PTVs with early stage (≤IIB) disease. This adverse survival trend in carriers with early stage disease was also observed in an independent series of 130 PC cases. We identified candidate DNA repair PC susceptibility genes and suggest that carriers of a germline PTV in a DNA repair gene with early stage disease have worse survival.
[Show abstract][Hide abstract] ABSTRACT: Accurate detection of somatic single nucleotide variants and small insertions and deletions from DNA sequencing experiments of tumour-normal pairs is a challenging task. Tumour samples are often contaminated with normal cells confounding the available evidence for the somatic variants. Furthermore, tumours are heterogeneous so sub-clonal variants are observed at reduced allele frequencies. We present here a cell-line titration series dataset that can be used to evaluate somatic variant calling pipelines with the goal of reliably calling true somatic mutations at low allele frequencies.
Cell-line DNA was mixed with matched normal DNA at 8 different ratios to generate samples with known tumour cellularities, and exome sequenced on Illumina HiSeq to depths of >300×. The data was processed with several different variant calling pipelines and verification experiments were performed to assay >1500 somatic variant candidates using Ion Torrent PGM as an orthogonal technology. By examining the variants called at varying cellularities and depths of coverage, we show that the best performing pipelines are able to maintain a high level of precision at any cellularity. In addition, we estimate the number of true somatic variants undetected as cellularity and coverage decrease.
Our cell-line titration series dataset, along with the associated verification results, was effective for this evaluation and will serve as a valuable dataset for future somatic calling algorithm development. The data is available for further analysis at the European Genome-phenome Archive under accession number EGAS00001001016. Data access requires registration through the International Cancer Genome Consortium’s Data Access Compliance Office (ICGC DACO).
[Show abstract][Hide abstract] ABSTRACT: WormBase (www.wormbase.org) is a central repository for research data on the biology, genetics and genomics of Caenorhabditis elegans and other nematodes. The project has evolved from its original remit to collect and integrate all data for a single species,
and now extends to numerous nematodes, ranging from evolutionary comparators of C. elegans to parasitic species that threaten plant, animal and human health. Research activity using C. elegans as a model system is as vibrant as ever, and we have created new tools for community curation in response to the ever-increasing
volume and complexity of data. To better allow users to navigate their way through these data, we have made a number of improvements
to our main website, including new tools for browsing genomic features and ontology annotations. Finally, we have developed
a new portal for parasitic worm genomes. WormBase ParaSite (parasite.wormbase.org) contains all publicly available nematode and platyhelminth annotated genome sequences, and is designed specifically to support
helminth genomic research.
Full-text · Article · Nov 2015 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are
genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last
NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled
reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both
visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are
also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to ∼200 curated rice reference
pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers
interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from
curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have
also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials.
Full-text · Article · Nov 2015 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: In a classical view of hematopoiesis, the various blood cell lineages arise via a hierarchical scheme starting with multipotent
stem cells that become increasingly restricted in their differentiation potential through oligopotent and then unipotent progenitors.
We developed a cell-sorting scheme to resolve myeloid (My), erythroid (Er), and megakaryocytic (Mk) fates from single CD34+
cells and then mapped the progenitor hierarchy across human development. Fetal liver contained large numbers of distinct oligopotent
progenitors with intermingled My, Er, and Mk fates. However, few oligopotent progenitor intermediates were present in the
adult bone marrow. Instead only two progenitor classes predominate, multipotent and unipotent, with Er-Mk lineages emerging
from multipotent cells. The developmental shift to an adult “two-tier” hierarchy challenges current dogma and provides a revised
framework to understand normal and disease states of human hematopoiesis.
[Show abstract][Hide abstract] ABSTRACT: Genomic information on tumors from 50 cancer types cataloged by the International Cancer Genome Consortium (ICGC) shows that only a few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been large interest in developing pathway and network analysis methods that group genes and illuminate the processes involved. We provide an overview of these analysis techniques and show where they guide mechanistic and translational investigations.
[Show abstract][Hide abstract] ABSTRACT: Herein we provide a detailed molecular analysis of the spatial heterogeneity of clinically localized, multifocal prostate cancer to delineate new oncogenes or tumor suppressors. We initially determined the copy number aberration (CNA) profiles of 74 patients with index tumors of Gleason score 7. Of these, 5 patients were subjected to whole-genome sequencing using DNA quantities achievable in diagnostic biopsies, with detailed spatial sampling of 23 distinct tumor regions to assess intraprostatic heterogeneity in focal genomics. Multifocal tumors are highly heterogeneous for single-nucleotide variants (SNVs), CNAs and genomic rearrangements. We identified and validated a new recurrent amplification of MYCL, which is associated with TP53 deletion and unique profiles of DNA damage and transcriptional dysregulation. Moreover, we demonstrate divergent tumor evolution in multifocal cancer and, in some cases, tumors of independent clonal origin. These data represent the first systematic relation of intraprostatic genomic heterogeneity to predicted clinical outcome and inform the development of novel biomarkers that reflect individual prognosis.
[Show abstract][Hide abstract] ABSTRACT: The nematode Caenorhabditis briggsae is a model for comparative developmental evolution with C. elegans. Worldwide collections of C. briggsae have implicated an intriguing history of divergence among genetic groups separated by latitude, or by restricted geography, that is being exploited to dissect the genetic basis to adaptive evolution and reproductive incompatibility. And yet, the genomic scope and timing of population divergence is unclear. We performed high-coverage whole-genome sequencing of 37 wild isolates of the nematode C. briggsae and applied a pairwise sequentially Markovian coalescent (PSMC) model to 703 combinations of genomic haplotypes to draw inferences about population history, the genomic scope of natural selection, and to compare with 40 wild isolates of C. elegans. We estimate that a diaspora of at least 6 distinct C. briggsae lineages separated from one another approximately 200 thousand generations ago, including the 'Temperate' and 'Tropical' phylogeographic groups that dominate most samples from around the world. Moreover, an ancient population split in its history 2 million generations ago, coupled with only rare gene flow among lineage groups, validates this system as a model for incipient speciation. Low versus high recombination regions of the genome give distinct signatures of population size change through time, indicative of widespread effects of selection on highly linked portions of the genome owing to extreme inbreeding by self-fertilization. Analysis of functional mutations indicates that genomic context, owing to selection that acts on long linkage blocks, is a more important driver of population variation than are the functional attributes of the individually encoded genes.
Published by Cold Spring Harbor Laboratory Press.
[Show abstract][Hide abstract] ABSTRACT: Caenorhabditis elegans mutants deleted for TDP-1, an ortholog of the neurodegeneration-associated RNA-binding protein TDP-43, display only mild phenotypes. Nevertheless, transcriptome sequencing revealed that many RNAs were altered in accumulation and/or processing in the mutant. Analysis of these transcriptional abnormalities demonstrates that a primary function of TDP-1 is to limit formation or stability of double-stranded RNA. Specifically, we found that deletion of tdp-1: (1) preferentially alters the accumulation of RNAs with inherent double-stranded structure (dsRNA); (2) increases the accumulation of nuclear dsRNA foci; (3) enhances the frequency of adenosine-to-inosine RNA editing; and (4) dramatically increases the amount of transcripts immunoprecipitable with a dsRNA-specific antibody, including intronic sequences, RNAs with antisense overlap to another transcript, and transposons. We also show that TDP-43 knockdown in human cells results in accumulation of dsRNA, indicating that suppression of dsRNA is a conserved function of TDP-43 in mammals. Altered accumulation of structured RNA may account for some of the previously described molecular phenotypes (e.g., altered splicing) resulting from reduction of TDP-43 function.
[Show abstract][Hide abstract] ABSTRACT: Fulfilling the promise of the genetic revolution requires the analysis of large datasets containing information from thousands to millions of participants. However, sharing human genomic data requires protecting subjects from potential harm. Current models rely on de-identification techniques in which privacy versus data utility becomes a zero-sum game. Instead, we propose the use of trust-enabling techniques to create a solution in which researchers and participants both win. To do so we introduce three principles that facilitate trust in genetic research and outline one possible framework built upon those principles. Our hope is that such trust-centric frameworks provide a sustainable solution that reconciles genetic privacy with data sharing and facilitates genetic research.
[Show abstract][Hide abstract] ABSTRACT: High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called "ReactomeFIViz", which utilizes a highly reliable gene functional interaction network combined with human curated pathways derived from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.
[Show abstract][Hide abstract] ABSTRACT: Noonan syndrome (NS) is a relatively common genetic disorder, characterized by typical facies, short stature, developmental delay, and cardiac abnormalities. Known causative genes account for 70-80% of clinically diagnosed NS patients, but the genetic basis for the remaining 20-30% of cases is unknown. We performed next-generation sequencing on germ-line DNA from 27 NS patients lacking a mutation in the known NS genes. We identified gain-of-function alleles in Ras-like without CAAX 1 (RIT1) and mitogen-activated protein kinase kinase 1 (MAP2K1) and previously unseen loss-of-function variants in RAS p21 protein activator 2 (RASA2) that are likely to cause NS in these patients. Expression of the mutant RASA2, MAP2K1, or RIT1 alleles in heterologous cells increased RAS-ERK pathway activation, supporting a causative role in NS pathogenesis. Two patients had more than one disease-associated variant. Moreover, the diagnosis of an individual initially thought to have NS was revised to neurofibromatosis type 1 based on an NF1 nonsense mutation detected in this patient. Another patient harbored a missense mutation in NF1 that resulted in decreased protein stability and impaired ability to suppress RAS-ERK activation; however, this patient continues to exhibit a NS-like phenotype. In addition, a nonsense mutation in RPS6KA3 was found in one patient initially diagnosed with NS whose diagnosis was later revised to Coffin-Lowry syndrome. Finally, we identified other potential candidates for new NS genes, as well as potential carrier alleles for unrelated syndromes. Taken together, our data suggest that next-generation sequencing can provide a useful adjunct to RASopathy diagnosis and emphasize that the standard clinical categories for RASopathies might not be adequate to describe all patients.
Full-text · Article · Jul 2014 · Proceedings of the National Academy of Sciences
[Show abstract][Hide abstract] ABSTRACT: Tumors often contain multiple subpopulations of cancerous cells defined by distinct somatic mutations. We describe a new method, PhyloWGS, which can be applied to whole-genome sequencing data from one or more tumor samples to reconstruct complete genotypes of these subpopulations based on variant allele frequencies (VAFs) of point mutations and population frequencies of structural variations. We introduce a principled phylogenic correction for VAFs in loci affected by copy number alterations and we show that this correction greatly improves subclonal reconstruction compared to existing methods. PhyloWGS is free, open-source software, available at https://github.com/morrislab/phylowgs.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-015-0602-8) contains supplementary material, which is available to authorized users.