[Show abstract][Hide abstract] ABSTRACT: Improvements in mass spectrometry (MS)-based peptide sequencing provide a new opportunity to determine whether polymorphisms, mutations and splice variants identified in cancer cells are translated. Herein we apply a proteogenomic data integration tool (QUILTS) to illustrate protein variant discovery using whole genome, whole transcriptome and global proteome datasets generated from a pair of luminal and basal-like breast cancer patient derived xenografts (PDX). The sensitivity of proteogenomic analysis for singe nucleotide variant (SNV) expression and novel splice junction (NSJ) detection was probed using multiple MS/MS sample process replicates defined here as an independent tandem MS experiment using identical sample material. Despite analysis of over thirty sample process replicates, only about 10% of SNVs (somatic and germline) detected by both DNA and RNA sequencing were observed as peptides. An even smaller proportion of peptides corresponding to NSJ observed by RNA sequencing were detected (<0.1%). Peptides mapping to DNA-detected SNVs without a detectable mRNA transcript were also observed, suggesting that transcriptome coverage was incomplete (~80%). In contrast to germline variants, somatic variants were less likely to be detected at the peptide level in the basal-like tumor than in the luminal tumor raising the possibility of differential translation or protein degradation effects. In conclusion, this large-scale proteogenomic integration allowed us to determine the degree to which mutations are translated and identified gaps in sequence coverage, thereby benchmarking current technology and progress towards whole cancer proteome and transcriptome analysis.
[Show abstract][Hide abstract] ABSTRACT: While Next-generation sequencing (NGS) has become the primary technology for discovering gene fusions we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently we hypothesize that the orthogonal validation from integrating WGS and RNA-seq could generate a sensitive and specific approach for detecting high confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with available genome and transcriptome sequencing data. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed 6 out of the 138 validated gene fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from the TCGA and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use.
[Show abstract][Hide abstract] ABSTRACT: Cancer cell lines can be useful to model cancer stem cells. Infection with Mycoplasma species is an insidious problem in mammalian cell culture. While investigating stem-like properties in early passage melanoma cell lines, we noted poorly reproducible results from an aliquot of a cell line that was later found to be infected with Mycoplasma hyorhinis. Deliberate infection of other early passage melanoma cell lines aliquots induced variable and unpredictable effects on expression of putative cancer stem cell markers, clonogenicity, proliferation and global gene expression. Cell lines established in stem cell media (SCM) were equally susceptible. Mycoplasma status is rarely reported in publications using cultured cells to study the cancer stem cell hypothesis. Our work highlights the importance of surveillance for Mycoplasma infection while using any cultured cells to interrogate tumor heterogeneity.
Full-text · Article · Oct 2015 · Stem Cell Reviews and Reports
[Show abstract][Hide abstract] ABSTRACT: Advances in reconstructing the clonal architecture of tumors greatly enhance our understanding of the molecular events within a patient and their context relative to one another. In the rapidly unfolding era of personalized medicine, the ability to monitor clonal evolution throughout patient care has significant clinical implications for the appropriate development or application of targeted therapies as well as understanding the potential mechanisms driving resistance. In this review, we discuss advances in biotechnology and bioinformatics that improve precision treatment by dissecting clonal evolution, focusing first on the initial discoveries in lymphomas and leukemias followed by the more recent applications to advance our understanding of prostate cancer (PCa). Teaser: The ability to monitor clonal evolution throughout patient care has significant clinical implications for the development and application of personalized therapy.
No preview · Article · Oct 2015 · Drug discovery today
[Show abstract][Hide abstract] ABSTRACT: Summary Tumors are typically sequenced to depths of 75x-100x (exome) or 30x-50x (whole genome). We demonstrate that current sequencing paradigms are inadequate for tumors that are impure, aneuploid, or clonally heterogeneous. To reassess optimal sequencing strategies, we performed ultra-deep (up to ∼312x) whole genome sequencing and exome capture (up to ∼433x) of a primary acute myeloid leukemia, its subsequent relapse, and a matched normal skin sample. We tested multiple alignment and variant calling algorithms and validated ∼200,000 putative SNVs by sequencing them to depths of ∼1,000x. Additional targeted sequencing provided over 10,000x coverage and ddPCR assays provided up to ∼250,000x sampling of selected sites. We evaluated the effects of different library generation approaches, depth of sequencing, and analysis strategies on the ability to effectively characterize a complex tumor. This dataset, representing the most comprehensively sequenced tumor described to date, will serve as an invaluable community resource (dbGaP: phs000159).
[Show abstract][Hide abstract] ABSTRACT: In this work, we present the Genome Modeling System (GMS), an analysis information management system capable of executing automated genome analysis pipelines at a massive scale. The GMS framework provides detailed tracking of samples and data coupled with reliable and repeatable analysis pipelines. The GMS also serves as a platform for bioinformatics development, allowing a large team to collaborate on data analysis, or an individual researcher to leverage the work of others effectively within its data management system. Rather than separating ad-hoc analysis from rigorous, reproducible pipelines, the GMS promotes systematic integration between the two. As a demonstration of the GMS, we performed an integrated analysis of whole genome, exome and transcriptome sequencing data from a breast cancer cell line (HCC1395) and matched lymphoblastoid line (HCC1395BL). These data are available for users to test the software, complete tutorials and develop novel GMS pipeline configurations. The GMS is available at https://github.com/genome/gms.
[Show abstract][Hide abstract] ABSTRACT: A growing number of gene-centric studies have highlighted the emerging significance of lncRNAs in cancer. However, these studies primarily focus on a single cancer type. Therefore, we conducted a pan-cancer analysis of lncRNAs comparing tumor and matched normal expression levels using RNA-Seq data from ˜3000 patients in eight solid tumor types. While the majority of differentially expressed lncRNAs display tissue-specific expression we discovered 229 lncRNAs with outlier or differential expression across multiple cancers, which we refer to as 'onco-lncRNAs'. Due to their consistent altered expression, we hypothesize that these onco-lncRNAs may have conserved oncogenic and tumor suppressive functions across cancers. To address this, we associated the onco-lncRNAs in biological processes based on their co-expressed protein coding genes. To validate our predictions, we experimentally confirmed cell growth dependence of two novel oncogenic lncRNAs, onco-lncRNA-3 and onco-lncRNA-12, and a previously identified lncRNA CCAT1. Overall, we discovered lncRNAs that may have broad oncogenic and tumor suppressor roles that could significantly advance our understanding of cancer lncRNA biology.
[Show abstract][Hide abstract] ABSTRACT: In head and neck squamous cell cancer (HNSCC), four intrinsic subtypes (or groups) have been identified, and each one possesses a unique biology that will require specific treatment strategies. We previously reported that mesenchymal (group 2) tumors exhibit reduced levels of Trop2 expression. In this study, we investigated the functional role of Trop2 in HNSCC and find that loss results in autocrine activation of the EGFR family member ErbB3 via neuregulin-1. Trop2 localizes to both the cell surface and cytosol of HNSCC cells and forms a complex with neuregulin-1, which is predominantly cytosolic. Inactivation of Trop2 increases the concentration of neuregulin-1 at the cell surface where it is cleaved to activate ErbB3. In primary HNSCC, detection of ErbB3 activation was limited to Trop2 negative tumors. An analysis of the Cancer Genome Atlas (TCGA) HNSCC dataset confirms enrichment for ErbB3 activity in mesenchymal tumors. Notably, Trop2 loss triggers sensitivity to anti-ErbB3 antibodies, which results in reduced proliferation and tumorigenic growth of Trop2 negative HNSCC cancer cells. These results uncover a molecular mechanism by which tumor cells control the amount of cell-surface neuregulin-1 available for cleavage and ErbB3 activation. Moreover, we demonstrate that Trop2 is a potential surrogate biomarker to identify tumors with ErbB3 activation and may therefore respond to anti-ErbB3 therapeutics.
[Show abstract][Hide abstract] ABSTRACT: Background
Long intergenic non-coding RNAs (lncRNAs) represent an emerging and under-studied class of transcripts that play a significant role in human cancers. Due to the tissue- and cancer-specific expression patterns observed for many lncRNAs it is believed that they could serve as ideal diagnostic biomarkers. However, until each tumor type is examined more closely, many of these lncRNAs will remain elusive.ResultsHere we characterize the lncRNA landscape in lung cancer using publicly available transcriptome sequencing data from a cohort of 567 adenocarcinoma and squamous cell carcinoma tumors. Through this compendium we identify over three thousand unannotated intergenic transcripts representing novel lncRNAs. Through comparison of both adenocarcinoma and squamous cell carcinomas with matched controls we discover 111 differentially expressed lncRNAs, which we term lung cancer associated lncRNAs (LCALs). A pan-cancer analysis of 324 additional tumor and adjacent normal pairs enable us to identify a subset of lncRNAs that display enriched expression specific to lung cancer as well as a subset that appear to be broadly deregulated across human cancers. Integration of exome sequencing data reveals that expression levels of many LCALs have significant associations with the mutational status of key oncogenes in lung cancer. Functional validation, using both knockdown and overexpression, shows that the most differentially expressed lncRNA, LCAL1, plays a role in cellular proliferation.Conclusions
Our systematic characterization of publicly available transcriptome data provides the foundation for future efforts to understand the role of lung cancer associated lncRNAs, develop novel biomarkers, and improve knowledge of lung tumor biology.
[Show abstract][Hide abstract] ABSTRACT: Adenocarcinoma of the lung is the leading cause of cancer death worldwide. Here we report molecular profiling of 230 resected lung adenocarcinomas using messenger RNA, microRNA and DNA sequencing integrated with copy number, methylation and proteomic analyses. High rates of somatic mutation were seen (mean 8.9 mutations per megabase). Eighteen genes were statistically significantly mutated, including RIT1 activating mutations and newly described loss-of-function MGA mutations which are mutually exclusive with focal MYC amplification. EGFR mutations were more frequent in female patients, whereas mutations in RBM10 were more common in males. Aberrations in NF1, MET, ERBB2 and RIT1 occurred in 13% of cases and were enriched in samples otherwise lacking an activated oncogene, suggesting a driver role for these events in certain tumours. DNA and mRNA sequence from the same tumour highlighted splicing alterations driven by somatic genomic changes, including exon 14 skipping in MET mRNA in 4% of cases. MAPK a
[Show abstract][Hide abstract] ABSTRACT: High-throughput sequencing technologies, including RNA-seq, have made it possible to move beyond gene expression analysis
to study transcriptional events including alternative splicing and gene fusions. Furthermore, recent studies in cancer have
suggested the importance of identifying transcriptionally altered loci as biomarkers for improved prognosis and therapy. While
many statistical methods have been proposed for identifying novel transcriptional events with RNA-seq, nearly all rely on
contrasting known classes of samples, such as tumor and normal. Few tools exist for the unsupervised discovery of such events
without class labels. In this paper, we present SigFuge for identifying genomic loci exhibiting differential transcription
patterns across many RNA-seq samples. SigFuge combines clustering with hypothesis testing to identify genes exhibiting alternative
splicing, or differences in isoform expression. We apply SigFuge to RNA-seq cohorts of 177 lung and 279 head and neck squamous
cell carcinoma samples from the Cancer Genome Atlas, and identify several cases of differential isoform usage including CDKN2A, a tumor suppressor gene known to be inactivated in a majority of lung squamous cell tumors. By not restricting attention
to known sample stratifications, SigFuge offers a novel approach to unsupervised screening of genetic loci across RNA-seq
cohorts. SigFuge is available as an R package through Bioconductor.
Full-text · Article · Jul 2014 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: The use of massively parallel sequencing for studying RNA expression has greatly enhanced our understanding of the transcriptome through the myriad ways these data can be characterized. In particular, clinical samples provide important insights about RNA expression in health and disease, yet these studies can be complicated by RNA degradation that results from the use of formalin as a clinical preservative and by the limited amounts of RNA often available from these precious samples. In this study we describe the combined use of RNA sequencing with an exome capture selection step to enhance the yield of on-exon sequencing read data when compared with RNA sequencing alone. In particular, the exome capture step preserves the dynamic range of expression, permitting differential comparisons and validation of expressed mutations from limited and FFPE preserved samples, while reducing the data generation requirement. We conclude that cDNA hybrid capture has the potential to significantly improve transcriptome analysis from low-yield FFPE material.
Full-text · Article · May 2014 · The Journal of molecular diagnostics: JMD
[Show abstract][Hide abstract] ABSTRACT: Micropapillary carcinoma (MPC) is a rare histological special type of breast cancer, characterized by an aggressive clinical behavior and a pattern of copy number aberrations (CNAs) distinct from that of grade- and estrogen receptor (ER)-matched invasive carcinomas of no special type (IC-NSTs). The aims of this study were to determine whether MPCs are underpinned by a recurrent fusion gene(s) or mutations in 273 genes recurrently mutated in breast cancer. Sixteen MPCs were subjected to microarray-based comparative genomic hybridization (aCGH) analysis and Sequenom Oncocarta mutation analysis. Eight and five MPCs were subjected to targeted capture and RNA sequencing, respectively. aCGH analysis confirmed our previous observations about the repertoire of CNAs of MPCs. Sequencing analysis revealed a spectrum of mutations similar to those of luminal B IC-NSTs, and recurrent mutations affecting mitogen-activated protein kinase family genes and NBPF10. RNA-sequencing analysis identified 17 high-confidence fusion genes, eight of which were validated and 2 of which were in-frame. No recurrent fusions were identified in an independent series of MPCs and IC-NSTs. Forced expression of in-frame fusion genes (SLC2A1-FAF1 and BCAS4-AURKA) resulted in increased viability of breast cancer cells. In addition, genomic disruption of CDK12 caused by out-of-frame rearrangements was found in one MPC and in 13% of HER2-positive cancers, identified through a reanalysis of publicly available massively parallel sequencing data. In vitro analyses revealed that CDK12 gene disruption results in sensitivity to PARP inhibition and forced expression of wild-type CDK12 in a CDK12-null cell line model resulted in relative resistance to PARP inhibition. Our findings demonstrate that MPCs are neither defined by highly recurrent mutations in the 273 genes tested, nor underpinned by a recurrent fusion gene. Although seemingly private genetic events, some of the fusion transcripts found in MPCs may play a role in maintenance of a malignant phenotype and potentially offer therapeutic opportunities.
Full-text · Article · Apr 2014 · The Journal of Pathology