[Show abstract][Hide abstract] ABSTRACT: The development of targeted anti-cancer therapies through the study of cancer genomes is intended to increase survival rates and decrease treatment-related toxicity. We treated a transposon-driven, functional genomic mouse model of medulloblastoma with 'humanized' in vivo therapy (microneurosurgical tumour resection followed by multi-fractionated, image-guided radiotherapy). Genetic events in recurrent murine medulloblastoma exhibit a very poor overlap with those in matched murine diagnostic samples (<5%). Whole-genome sequencing of 33 pairs of human diagnostic and post-therapy medulloblastomas demonstrated substantial genetic divergence of the dominant clone after therapy (<12% diagnostic events were retained at recurrence). In both mice and humans, the dominant clone at recurrence arose through clonal selection of a pre-existing minor clone present at diagnosis. Targeted therapy is unlikely to be effective in the absence of the target, therefore our results offer a simple, proximal, and remediable explanation for the failure of prior clinical trials of targeted therapy.
[Show abstract][Hide abstract] ABSTRACT: The genome sequences of the plastid and mitochondrion of white spruce (Picea glauca) were assembled from whole-genome shotgun sequencing data using ABySS. The sequencing data contained reads from both the
nuclear and organellar genomes, and reads of the organellar genomes were abundant in the data as each cell harbors hundreds
of mitochondria and plastids. Hence, assembly of the 123-kb plastid and 5.9-Mb mitochondrial genomes were accomplished by
analyzing data sets primarily representing low coverage of the nuclear genome. The assembled organellar genomes were annotated
for their coding genes, ribosomal RNA, and transfer RNA. Transcript abundances of the mitochondrial genes were quantified
in three developmental tissues and five mature tissues using data from RNA-seq experiments. C-to-U RNA editing was observed
in the majority of mitochondrial genes, and in four genes, editing events were noted to modify ACG codons to create cryptic
AUG start codons. The informatics methodology presented in this study should prove useful to assemble organellar genomes of
other plant species using whole-genome shotgun sequencing data.
Full-text · Article · Dec 2015 · Genome Biology and Evolution
[Show abstract][Hide abstract] ABSTRACT: The Open Regulatory Annotation database (ORegAnno) is a resource for curated regulatory annotation. It contains information
about regulatory regions, transcription factor binding sites, RNA binding sites, regulatory variants, haplotypes, and other
regulatory elements. ORegAnno differentiates itself from other regulatory resources by facilitating crowd-sourced interpretation
and annotation of regulatory observations from the literature and highly curated resources. It contains a comprehensive annotation
scheme that aims to describe both the elements and outcomes of regulatory events. Moreover, ORegAnno assembles these disparate
data sources and annotations into a single, high quality catalogue of curated regulatory information. The current release
is an update of the database previously featured in the NAR Database Issue, and now contains 1 948 307 records, across 18
species, with a combined coverage of 334 215 080 bp. Complete records, annotation, and other associated data are available
for browsing and download at http://www.oreganno.org/.
No preview · Article · Nov 2015 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: Summary There is substantial heterogeneity among primary prostate cancers, evident in the spectrum of molecular abnormalities and its variable clinical course. As part of The Cancer Genome Atlas (TCGA), we present a comprehensive molecular analysis of 333 primary prostate carcinomas. Our results revealed a molecular taxonomy in which 74% of these tumors fell into one of seven subtypes defined by specific gene fusions (ERG, ETV1/4, and FLI1) or mutations (SPOP, FOXA1, and IDH1). Epigenetic profiles showed substantial heterogeneity, including an IDH1 mutant subset with a methylator phenotype. Androgen receptor (AR) activity varied widely and in a subtype-specific manner, with SPOP and FOXA1 mutant tumors having the highest levels of AR-induced transcripts. 25% of the prostate cancers had a presumed actionable lesion in the PI3K or MAPK signaling pathways, and DNA repair genes were inactivated in 19%. Our analysis reveals molecular heterogeneity among primary prostate cancers, as well as potentially actionable molecular defects.
[Show abstract][Hide abstract] ABSTRACT: Background Papillary renal-cell carcinoma, which accounts for 15 to 20% of renal-cell carcinomas, is a heterogeneous disease that consists of various types of renal cancer, including tumors with indolent, multifocal presentation and solitary tumors with an aggressive, highly lethal phenotype. Little is known about the genetic basis of sporadic papillary renal-cell carcinoma, and no effective forms of therapy for advanced disease exist. Methods We performed comprehensive molecular characterization of 161 primary papillary renal-cell carcinomas, using whole-exome sequencing, copy-number analysis, messenger RNA and microRNA sequencing, DNA-methylation analysis, and proteomic analysis. Results Type 1 and type 2 papillary renal-cell carcinomas were shown to be different types of renal cancer characterized by specific genetic alterations, with type 2 further classified into three individual subgroups on the basis of molecular differences associated with patient survival. Type 1 tumors were associated with MET alterations, whereas type 2 tumors were characterized by CDKN2A silencing, SETD2 mutations, TFE3 fusions, and increased expression of the NRF2-antioxidant response element (ARE) pathway. A CpG island methylator phenotype (CIMP) was observed in a distinct subgroup of type 2 papillary renal-cell carcinomas that was characterized by poor survival and mutation of the gene encoding fumarate hydratase (FH). Conclusions Type 1 and type 2 papillary renal-cell carcinomas were shown to be clinically and biologically distinct. Alterations in the MET pathway were associated with type 1, and activation of the NRF2-ARE pathway was associated with type 2; CDKN2A loss and CIMP in type 2 conveyed a poor prognosis. Furthermore, type 2 papillary renal-cell carcinoma consisted of at least three subtypes based on molecular and phenotypic features. (Funded by the National Institutes of Health.).
Full-text · Article · Nov 2015 · New England Journal of Medicine
[Show abstract][Hide abstract] ABSTRACT: Heart valve formation initiates when endothelial cells of the heart transform into mesenchyme and populate the cardiac cushions. The transcription factor, SOX9, is highly expressed in the cardiac cushion mesenchyme, and is essential for heart valve development. Loss of Sox9 in mouse cardiac cushion mesenchyme alters cell proliferation, embryonic survival, and disrupts valve formation. Despite this important role, little is known regarding how SOX9 regulates heart valve formation or its transcriptional targets. Therefore, we mapped putative SOX9 binding sites by ChIP-Seq in embryonic day (E) 12.5 heart valves, a stage at which the valve mesenchyme is actively proliferating and initiating differentiation. Embryonic heart valves have been shown to express a high number of genes that are associated with chondrogenesis, including several extracellular matrix proteins and transcription factors that regulate chondrogenesis. Consequently, we compared regions of putative SOX9 DNA-binding between E12.5 heart valves and E12.5 limb buds. We identified context-dependent and context-independent SOX9 interacting regions throughout the genome. Analysis of context-independent SOX9 binding suggests an extensive role for SOX9 across tissues in regulating proliferation-associated genes including key components of the AP-1 complex. Integrative analysis of tissue-specific SOX9 interacting regions and gene expression profiles on Sox9-deficient heart valves demonstrated that SOX9 controls the expression of several transcription factors with previously identified roles in heart valve development, including Twist1, Sox4, Mecom/Evi1 and Pitx2. Together, our data identifies SOX9 coordinated transcriptional hierarchies that control cell proliferation and differentiation during valve formation.
[Show abstract][Hide abstract] ABSTRACT: Owing to the complexity of the assembly problem, we do not yet have complete genome sequences. The difficulty in assembling reads into finished genomes is exacerbated by sequence repeats and the inability of short reads to capture sufficient genomic information to resolve those problematic regions. In this regard, established and emerging long read technologies show great promise, but their current associated higher error rates typically require computational base correction and/or additional bioinformatics pre-processing before they can be of value.
We present LINKS, the Long Interval Nucleotide K-mer Scaffolder algorithm, a method that makes use of the sequence properties of nanopore sequence data and other error-containing sequence data, to scaffold high-quality genome assemblies, without the need for read alignment or base correction. Here, we show how the contiguity of an ABySS Escherichia coli K-12 genome assembly can be increased greater than five-fold by the use of beta-released Oxford Nanopore Technologies Ltd. long reads and how LINKS leverages long-range information in Saccharomyces cerevisiae W303 nanopore reads to yield assemblies whose resulting contiguity and correctness are on par with or better than that of competing applications. We also present the re-scaffolding of the colossal white spruce (Picea glauca) draft assembly (PG29, 20 Gbp) and demonstrate how LINKS scales to larger genomes.
This study highlights the present utility of nanopore reads for genome scaffolding in spite of their current limitations, which are expected to diminish as the nanopore sequencing technology advances. We expect LINKS to have broad utility in harnessing the potential of long reads in connecting high-quality sequences of small and large genome assembly drafts.
[Show abstract][Hide abstract] ABSTRACT: Nr2e1 (nuclear receptor subfamily 2, group e, member 1) encodes a transcription factor important in neocortex development. Previous work has shown that nuclear receptors can have hundreds of target genes, and bind more than 300 co-interacting proteins. However, recognition of the critical role of Nr2e1 in neural stem cells and neocortex development is relatively recent, thus the molecular mechanisms involved for this nuclear receptor are only beginning to be understood. Serial analysis of gene expression (SAGE), has given researchers both qualitative and quantitative information pertaining to biological processes. Thus, in this work, six LongSAGE mouse libraries were generated from laser microdissected tissue samples of dorsal VZ/SVZ (ventricular zone and subventricular zone) from the telencephalon of wild-type (Wt) and Nr2e1-null embryos at the critical development ages E13.5, E15.5, and E17.5. We then used a novel approach, implementing multiple computational methods followed by biological validation to further our understanding of Nr2e1 in neocortex development.
In this work, we have generated a list of 1279 genes that are differentially expressed in response to altered Nr2e1 expression during in vivo neocortex development. We have refined this list to 64 candidate direct-targets of NR2E1. Our data suggested distinct roles for Nr2e1 during different neocortex developmental stages. Most importantly, our results suggest a possible novel pathway by which Nr2e1 regulates neurogenesis, which includes Lhx2 as one of the candidate direct-target genes, and SOX9 as a co-interactor.
In conclusion, we have provided new candidate interacting partners and numerous well-developed testable hypotheses for understanding the pathways by which Nr2e1 functions to regulate neocortex development.
[Show abstract][Hide abstract] ABSTRACT: Long noncoding RNAs (lncRNAs) regulate gene expression by association with chromatin, but how they target chromatin remains poorly understood. We have used chromatin RNA immunoprecipitation-coupled high-throughput sequencing to identify 276 lncRNAs enriched in repressive chromatin from breast cancer cells. Using one of the chromatin-interacting lncRNAs, MEG3, we explore the mechanisms by which lncRNAs target chromatin. Here we show that MEG3 and EZH2 share common target genes, including the TGF-β pathway genes. Genome-wide mapping of MEG3 binding sites reveals that MEG3 modulates the activity of TGF-β genes by binding to distal regulatory elements. MEG3 binding sites have GA-rich sequences, which guide MEG3 to the chromatin through RNA-DNA triplex formation. We have found that RNA-DNA triplex structures are widespread and are present over the MEG3 binding sites associated with the TGF-β pathway genes. Our findings suggest that RNA-DNA triplex formation could be a general characteristic of target gene recognition by the chromatin-interacting lncRNAs.
[Show abstract][Hide abstract] ABSTRACT: Previously reported [http://arxiv.org/abs/1506.06433] reprogramming of
substrate specificity of H3K4Me3 epigenetic marks reading PHD domain of BPTF
protein illustrates therapeutic potential of a new class of non-inhibitor small
organic compounds - variators. Here we address the question about
reproducibility of rational design of variators by reprogramming of the second
epigenetic marks reading domain of BPTF protein - bromodomain. Bromodomain of
BPTF binds to epigenetic marks in form of acetylated lysine of histone H4
(H4K12Ac, H4K16Ac and H4K20Ac), which physicochemical properties and binding
mode differs considerably from those of methylated H3K4 marks. Thus, detailed
description of computational approach for reprogramming of bromodomain
substrate specificity illustrates both general and target specific attributes
of computer aided variators design.
[Show abstract][Hide abstract] ABSTRACT: In lymphoma, mutations in genes of histone modifying proteins are frequently
observed. Notably, somatic mutations in the activatory histone modification
writing protein MLL2 and the repressive modification writer EZH2 are the most
frequent. Gain of function mutations are typically detected in EZH2 whilst MLL2
mutations are usually observed as conferring a homozygous loss of function. The
gain-of-function mutations in EZH2 provide an obvious target for the
development of inhibitors with therapeutic potential. To counter the loss of
functional MLL2 protein, we computationally predicted compounds that are able
to modulate the reader of the corresponding modifications, BPTF, to recognize
other forms of the histone H3 lysine 4, instead of the tri-methylated form
normally produced by MLL2. By forming a synthetic triple-complex of a compound,
the histone H3 tail and BPTF we potentially circumvent the requirement for
functional MLL2 methyl-transferase through the modulation of BPTF activity.
Here we show a proof-of-principle that special compounds, named variators, can
reprogram selectivity of protein binding and thus create artificial regulatory
pathways which can have a potential therapeutic role. A therapeutic role of
BPTF variators may extend to other diseases that involve loss of MLL2 function,
such as Kabuki syndrome or the aberrant functioning of H3K4 modification as
observed in Huntington disease and in memory formation.
[Show abstract][Hide abstract] ABSTRACT: We describe the landscape of genomic alterations in cutaneous melanomas through DNA, RNA, and protein-based analysis of 333 primary and/or metastatic melanomas from 331 patients. We establish a framework for genomic classification into one of four subtypes based on the pattern of the most prevalent significantly mutated genes: mutant BRAF, mutant RAS, mutant NF1, and Triple-WT (wild-type). Integrative analysis reveals enrichment of KIT mutations and focal amplifications and complex structural rearrangements as a feature of the Triple-WT subtype. We found no significant outcome correlation with genomic classification, but samples assigned a transcriptomic subclass enriched for immune gene expression associated with lymphocyte infiltrate on pathology review and high LCK protein expression, a T cell marker, were associated with improved patient survival. This clinicopathological and multi-dimensional analysis suggests that the prognosis of melanoma patients with regional metastases is influenced by tumor stroma immunobiology, offering insights to further personalize therapeutic decision-making.
[Show abstract][Hide abstract] ABSTRACT: White spruce (Picea glauca), a gymnosperm tree, has been established as one of the models for conifer genomics. We describe the draft genome assemblies of two white spruce genotypes, PG29 and WS77111, innovative tools for the assembly of very large genomes, and the conifer genomics resources developed in this process. The two white spruce genotypes originate from distant geographic regions of western (PG29) and eastern (WS77111) North America, and represent elite trees in two Canadian tree breeding programs. We present an update (V3 and V4) for a previously reported PG29 V2 draft genome assembly and introduce a second white spruce genome assembly for genotype WS77111. Assemblies of the PG29 and WS77111 genomes confirm the reconstructed white spruce genome size in the 20 Gbp range, and show broad synteny. Using the PG29 V3 assembly and additional white spruce genomics and transcriptomics resources, we performed MAKER-P annotation and meticulous expert annotation of very large gene families of conifer defense metabolism, the terpene synthases and cytochrome P450s. We also comprehensively annotated the white spruce mevalonate, methylerythritol phosphate and phenylpropanoid pathways. These analyses highlighted the large extent of gene and pseudogene duplications in a conifer genome, in particular for genes of secondary (i.e. specialized) metabolism, and the potential for gain and loss of function for defense and adaptation. This article is protected by copyright. All rights reserved.
This article is protected by copyright. All rights reserved.
Full-text · Article · May 2015 · The Plant Journal
[Show abstract][Hide abstract] ABSTRACT: Peritoneal mesothelioma is a rare and sometimes lethal malignancy that presents a clinical challenge for both diagnosis and management. Recent studies have led to a better understanding of the molecular biology of peritoneal mesothelioma. Translation of the emerging data into better treatments and outcome is needed. From two patients with peritoneal meso-thelioma, we derived whole genome sequences, RNA expression profiles, and targeted deep sequencing data. Molecular data were made available for translation into a clinical treatment plan. Treatment responses and outcomes were later examined in the context of molecular findings. Molecular studies presented here provide the first reported whole ge-nome sequences of peritoneal mesothelioma. Mutations in known mesothelioma-related genes NF2, CDKN2A, LATS2, amongst others, were identified. Activation of MET-related signaling pathways was demonstrated in both cases. A hypermutated phenotype was observed in one case (434 vs. 18 single nucleotide variants) and was associated with a favourable outcome despite sarcomatoid histology and multifocal disease. This study represents the first report of whole genome analyses of peritoneal mesothelioma, a key step in the understanding and treatment of this disease.
[Show abstract][Hide abstract] ABSTRACT: In a patient suspected clinically to have Weaver syndrome, we ruled out mutations in EZH2 and NSD1, then identified a previously undescribed de novo mutation in EZH2's partner protein EED. Both proteins are members of the Polycomb Repressive Complex 2 that maintains gene silencing. On the basis of the similarities of the patient's phenotype to Weaver syndrome, which is caused by de novo mutations in EZH2, and on other lines of evidence including mouse Eed hypomorphs, we characterize this mutation as probably pathogenic for a Weaver-like overgrowth syndrome. This is the first report of overgrowth and related phenotypes associated with a constitutional mutation in human EED.Journal of Human Genetics advance online publication, 19 March 2015; doi:10.1038/jhg.2015.26.
Full-text · Article · Mar 2015 · Journal of Human Genetics
[Show abstract][Hide abstract] ABSTRACT: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understa
[Show abstract][Hide abstract] ABSTRACT: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.