[Show abstract][Hide abstract] ABSTRACT: Background
Women with a family history of breast cancer face considerable uncertainty about whether to pursue standard screening, intensive screening, or prophylactic surgery. Accurate and individualized risk-estimation approaches may help these women make more informed decisions. Although highly penetrant genetic variants have been associated with familial breast cancer (FBC) risk, many individuals do not carry these variants, and many carriers never develop breast cancer. Common risk variants have a relatively modest effect on risk and show limited potential for predicting FBC development. As an alternative, we hypothesized that additional genomic data types, such as gene-expression levels, which can reflect genetic and epigenetic variation, could contribute to classifying a person’s risk status. Specifically, we aimed to identify common patterns in gene-expression levels across individuals who develop FBC.
We profiled peripheral blood mononuclear cells from women with a family history of breast cancer (with or without a germline BRCA1/2 variant) and from controls. We used the support vector machines algorithm to differentiate between patients who developed FBC and those who did not. Our study used two independent datasets, a training set of 124 women from Utah (USA) and an external validation (test) set from Ontario (Canada) of 73 women (197 total). We controlled for expression variation associated with clinical, demographic, and treatment variables as well as lymphocyte markers.
Our multigene biomarker provided accurate, individual-level estimates of FBC occurrence for the Utah cohort (AUC = 0.76 [0.67-84]) . Even at their lower confidence bounds, these accuracy estimates meet or exceed estimates from alternative approaches. Our Ontario cohort resulted in similarly high levels of accuracy (AUC = 0.73 [0.59-0.86]), thus providing external validation of our findings. Individuals deemed to have “high” risk by our model would have an estimated 2.4 times greater odds of developing familial breast cancer than individuals deemed to have “low” risk.
Together, these findings suggest that gene-expression levels in peripheral blood cells reflect genomic variation associated with breast cancer risk and that such data have potential to be used as a non-invasive biomarker for familial breast cancer risk.
Electronic supplementary material
The online version of this article (doi:10.1186/s12920-015-0145-6) contains supplementary material, which is available to authorized users.
Full-text · Article · Nov 2015 · BMC Medical Genomics
[Show abstract][Hide abstract] ABSTRACT: Motivation:
The Cancer Genome Atlas (TCGA) RNA-Sequencing data are used widely for research. TCGA provides 'Level 3' data, which have been processed using a pipeline specific to that resource. However, we have found using experimentally derived data that this pipeline produces gene-expression values that vary considerably across biological replicates. In addition, some RNA-Sequencing analysis tools require integer-based read counts, which are not provided with the Level 3 data. As an alternative, we have reprocessed the data for 9264 tumor and 741 normal samples across 24 cancer types using the Rsubread package. We have also collated corresponding clinical data for these samples. We provide these data as a community resource.
We compared TCGA samples processed using either pipeline and found that the Rsubread pipeline produced fewer zero-expression genes and more consistent expression levels across replicate samples than the TCGA pipeline. Additionally, we used a genomic-signature approach to estimate HER2 (ERBB2) activation status for 662 breast-tumor samples and found that the Rsubread data resulted in stronger predictions of HER2 pathway activity. Finally, we used data from both pipelines to classify 575 lung cancer samples based on histological type. This analysis identified various non-coding RNA that may influence lung-cancer histology.
Availability and implementation:
The RNA-Sequencing and clinical data can be downloaded from Gene Expression Omnibus (accession number GSE62944). Scripts and code that were used to process and analyze the data are available from https://github.com/srp33/TCGA_RNASeq_Clinical.
firstname.lastname@example.org or email@example.com Supplementary information: Supplementary material is available at Bioinformatics online.
[Show abstract][Hide abstract] ABSTRACT: Although in some cases individual genomic aberrations may drive disease development in isolation, a complex interplay among multiple aberrations is common. Accordingly, we developed Gene Set Omic Analysis (GSOA), a bioinformatics tool that can evaluate multiple types and combinations of omic data at the pathway level. GSOA uses machine learning to identify dysregulated pathways and improves upon other methods because of its ability to decipher complex, multigene patterns. We compare GSOA to alternative methods and demonstrate its ability to identify pathways known to play a role in various cancer phenotypes. Software implementing the GSOA method is freely available from https://bitbucket.org/srp33/gsoa.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-015-0189-4) contains supplementary material, which is available to authorized users.
[Show abstract][Hide abstract] ABSTRACT: Better approaches are needed to evaluate a single patient’s drug response at the genomic level. Targeted therapy for signaling pathways in cancer has met limited success in part due to the exceedingly interwoven nature of the pathways. In particular, the highly complex RAS network has been challenging to target. Effectively targeting the pathway requires development of techniques that measure global network activity to account for pathway complexity. For this purpose, we used a gene-expression-based biomarker for RAS network activity in non-small cell lung cancer (NSCLC) cells, and screened for drugs whose efficacy were significantly highly correlated to RAS network activity. Results identified EGFR and MEK co-inhibition as the most effective treatment for RAS-active NSCLC amongst a panel of over 360 compounds and fractions. RAS activity was identified in both RAS-mutant and wild-type lines, indicating broad characterization of RAS signaling inclusive of multiple mechanisms of RAS activity, and not solely based on mutation status. Mechanistic studies demonstrated that co-inhibition of EGFR and MEK induced apoptosis and blocked both EGFR-RAS-RAF-MEK-ERK and EGFR-PI3K-AKT-RPS6 nodes simultaneously in RAS-active, but not RAS-inactive NSCLC. These results provide a comprehensive strategy to personalize treatment of NSCLC based on RAS network dysregulation and provide proof-of-concept of a genomic approach to classify and target complex signaling networks.
No preview · Article · Oct 2014 · Molecular Oncology
[Show abstract][Hide abstract] ABSTRACT: Triple-negative breast cancer (TNBC) is aggressive and lacks targeted therapies. Phosphatidylinositide 3-kinase (PI3K) / mammalian target of rapamycin (mTOR) pathways are frequently activated in TNBC patient tumors at the genome, gene expression and protein levels, and mTOR inhibitors have been shown to inhibit growth in TNBC cell lines. We describe a panel of patient-derived xenografts representing multiple TNBC subtypes and use them to test preclinical drug efficacy of two mTOR inhibitors, sirolimus (rapamycin) and temsirolimus (CCI-779).
We generated a panel of seven patient-derived orthotopic xenografts from six primary TNBC tumors and one metastasis. Patient tumors and corresponding xenografts were compared by histology, immunohistochemistry, array comparative genomic hybridization (aCGH) and phosphatidylinositol-4,5-bisphosphate 3-kinase, catalytic subunit alpha (PIK3CA) sequencing; TNBC subtypes were determined. Using a previously published logistic regression approach, we generated a rapamycin response signature from Connectivity Map gene expression data and used it to predict rapamycin sensitivity in 1401 human breast cancers of different intrinsic subtypes, prompting in vivo testing of mTOR inhibitors and doxorubicin in our TNBC xenografts.
Patient-derived xenografts recapitulated histology, biomarker expression and global genomic features of patient tumors. Two primary tumors had PIK3CA coding mutations, and 5/6 primary tumors showed flanking intron single nucleotide polymorphisms (SNPs) with conservation of sequence variations between primary tumors and xenografts, even on subsequent xenograft passages. Gene expression profiling showed that our models represent at least four of six TNBC subtypes. The rapamycin response signature predicted sensitivity for 94% of basal-like breast cancers in a large dataset. Drug testing of mTOR inhibitors in our xenografts showed 77 to 99% growth inhibition, significantly more than doxorubicin; protein phosphorylation studies indicated constitutive activation of the mTOR pathway that decreased with treatment. However, no tumor was completely eradicated.
A panel of patient-derived xenograft models covering a spectrum of TNBC subtypes was generated that histologically and genomically matched original patient tumors. Consistent with in silico predictions, mTOR inhibitor testing in our TNBC xenografts showed significant tumor growth inhibition in all, suggesting that mTOR inhibitors can be effective in TNBC, but will require use with additional therapies, warranting investigation of optimal drug combinations.
Full-text · Article · Apr 2014 · Breast cancer research: BCR
[Show abstract][Hide abstract] ABSTRACT: Portraying high-throughput genomics research as a wild frontier, Andrea Bild and colleagues use caricatures to highlight common pitfalls in genomic research and provide recommendations for navigating this terrain.
[Show abstract][Hide abstract] ABSTRACT: Over the past two decades, many biotechnology platforms have been developed for high-throughput gene expression profiling. However, because each platform is subject to technology-specific biases and produces distinct raw-data distributions, researchers have experienced difficulty in integrating data across platforms. Data integration is crucial to data-generating consortiums, researchers transitioning to newer profiling technologies, and individuals seeking to aggregate data across experiments. We address this need with our Universal exPression Code (UPC) approach, which corrects for platform-specific background noise using models that account for the genomic base composition and length of target regions; this approach also uses a mixture model to estimate whether a gene is active in a particular profiling sample. The latter produces standardized UPC values on a zero-to-one scale, so that they can be interpreted consistently, irrespective of profiling technology, thus enabling downstream analysis pipelines to be developed in a platform-agnostic manner. The UPC method can be applied to one- and two-channel expression microarrays and to next-generation sequencing data (RNA sequencing). Furthermore, UPCs are derived using information from within a given sample only-no ancillary samples are required at processing time. Thus, UPCs are suitable for personalized-medicine workflows where samples must be processed individually rather than in batches. In a variety of analyses and comparisons, UPCs perform comparably to other methods designed specifically for microarrays or RNA sequencing in most settings. Software for calculating UPCs is freely available at www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html.
No preview · Article · Oct 2013 · Proceedings of the National Academy of Sciences
[Show abstract][Hide abstract] ABSTRACT: Alterations in epigenetic marks, including methylation or acetylation, are common in human cancers. For many epigenetic pathways, however, direct measures of activity are unknown, making their role in various cancers difficult to assess. Gene expression signatures facilitate the examination of patterns of epigenetic pathway activation across and within human cancer types allowing better understanding of the relationships between these pathways.
We used Bayesian regression to generate gene expression signatures from normal epithelial cells before and after epigenetic pathway activation. Signatures were applied to datasets from TCGA, GEO, CaArray, ArrayExpress, and the cancer cell line encyclopedia. For TCGA data, signature results were correlated with copy number variation and DNA methylation changes. GSEA was used to identify biologic pathways related to the signatures.
We developed and validated signatures reflecting downstream effects of enhancer of zeste homolog 2(EZH2), histone deacetylase(HDAC) 1, HDAC4, sirtuin 1(SIRT1), and DNA methyltransferase 2(DNMT2). By applying these signatures to data from cancer cell lines and tumors in large public repositories, we identify those cancers that have the highest and lowest activation of each of these pathways. Highest EZH2 activation is seen in neuroblastoma, hepatocellular carcinoma, small cell lung cancer, and melanoma, while highest HDAC activity is seen in pharyngeal cancer, kidney cancer, and pancreatic cancer. Across all datasets studied, activation of both EZH2 and HDAC4 is significantly underrepresented. Using breast cancer and glioblastoma as examples to examine intrinsic subtypes of particular cancers, EZH2 activation was highest in luminal breast cancers and proneural glioblastomas, while HDAC4 activation was highest in basal breast cancer and mesenchymal glioblastoma. EZH2 and HDAC4 activation are associated with particular chromosome abnormalities: EZH2 activation with aberrations in genes from the TGF and phosphatidylinositol pathways and HDAC4 activation with aberrations in inflammatory and chemokine related genes.
Gene expression patterns can reveal the activation level of epigenetic pathways. Epigenetic pathways define biologically relevant subsets of human cancers. EZH2 activation and HDAC4 activation correlate with growth factor signaling and inflammation, respectively, and represent two distinct states for cancer cells. This understanding may allow us to identify targetable drivers in these cancer subsets.
Preview · Article · Sep 2013 · BMC Medical Genomics
[Show abstract][Hide abstract] ABSTRACT: RATIONALE: Molecular phenotyping of COPD has been impeded in part by the difficulty in obtaining lung tissue samples from individuals with impaired lung function. OBJECTIVES: We sought to determine whether COPD-associated processes are reflected in gene-expression profiles of bronchial airway epithelial cells obtained via bronchoscopy. METHODS: Gene expression profiling of bronchial brushings obtained from 238 current and former smokers with and without COPD was performed using Affymetrix Human Gene 1.0 ST Arrays. MEASUREMENTS AND MAIN RESULTS: We identified 98 genes whose expression levels were associated with COPD status, FEV1% predicted, and FEV1/FVC. In silico analysis identified ATF4 as a potential transcriptional regulator of genes with COPD-associated airway expression, and ATF4 overexpression in airway epithelial cells in vitro recapitulates COPD-associated gene expression changes. Genes with COPD-associated expression in the bronchial airway epithelium had similarly altered expression profiles in prior studies performed on small-airway epithelium and lung parenchyma, suggesting that transcriptomic alterations in the bronchial airway epithelium reflect molecular events found at more distal sites of disease activity. Many of the airway COPD-associated gene expression changes revert toward baseline following therapy with the inhaled corticosteroid fluticasone in independent cohorts. CONCLUSIONS: Our findings demonstrate a molecular field of injury throughout the bronchial airway of active and former smokers with COPD that may be driven in part by ATF4 and is modifiable with therapy. Bronchial airway epithelium may therefore ultimately serve as a relatively accessible tissue in which to measure biomarkers of disease activity for guiding clinical management of COPD.
Full-text · Article · Mar 2013 · American Journal of Respiratory and Critical Care Medicine
[Show abstract][Hide abstract] ABSTRACT: Cigarette smoke produces a molecular "field of injury" in epithelial cells lining the respiratory tract. However, the specific signaling pathways that are altered in the airway of smokers and the signaling processes responsible for the transition from smoking-induced airway damage to lung cancer remain unknown. In this study, we use a genomic approach to study the signaling processes associated with tobacco smoke exposure and lung cancer. First, we developed and validated pathway-specific gene expression signatures in bronchial airway epithelium that reflect activation of signaling pathways relevant to tobacco-exposure including ATM, BCL2, GPX1, NOS2, IKBKB, and SIRT1. Using these profiles and four independent gene expression datasets, we found that SIRT1 activity is significantly up-regulated in cytologically normal airway epithelial cells from active smokers compared to non-smokers. In contrast, this activity is strikingly down-regulated in non-small cell lung cancer. This pattern of signaling modulation was unique to SIRT1, and down-regulation of SIRT1 activity is confined to tumors from smokers. Decreased activity of SIRT1 was validated using genomic analyses of mouse models of lung cancer and biochemical testing of SIRT1 activity in patient lung tumors. Together, our findings indicate a role of SIRT1 in response to smoke and a potential role in repressing lung cancer. Further, our findings suggest that the airway gene-expression signatures derived in this study can provide novel insights into signaling pathways altered in the "field of inury" induced by tobacco smoke and thus may impact strategies for prevention of tobacco-related lung cancer.
[Show abstract][Hide abstract] ABSTRACT: Gene-expression microarrays allow researchers to characterize biological phenomena in a high-throughput fashion but are subject to technological biases and inevitable variabilities that arise during sample collection and processing. Normalization techniques aim to correct such biases. Most existing methods require multiple samples to be processed in aggregate; consequently, each sample's output is influenced by other samples processed jointly. However, in personalized-medicine workflows, samples may arrive serially, so renormalizing all samples upon each new arrival would be impractical. We have developed Single Channel Array Normalization (SCAN), a single-sample technique that models the effects of probe-nucleotide composition on fluorescence intensity and corrects for such effects, dramatically increasing the signal-to-noise ratio within individual samples while decreasing variation across samples. In various benchmark comparisons, we show that SCAN performs as well as or better than competing methods yet has no dependence on external reference samples and can be applied to any single-channel microarray platform.
[Show abstract][Hide abstract] ABSTRACT: We leverage genomic and biochemical data to identify synergistic drug regimens for breast cancer. In order to study the mechanism of the histone deacetylase (HDAC) inhibitors valproic acid (VPA) and suberoylanilide hydroxamic acid (SAHA) in breast cancer, we generated and validated genomic profiles of drug response using a series of breast cancer cell lines sensitive to each drug. These genomic profiles were then used to model drug response in human breast tumors and show significant correlation between VPA and SAHA response profiles in multiple breast tumor data sets, highlighting their similar mechanism of action. The genes deregulated by VPA and SAHA converge on the cell cycle pathway (Bayes factor 5.21 and 5.94, respectively; P-value 10(-8.6) and 10(-9), respectively). In particular, VPA and SAHA upregulate key cyclin-dependent kinase (CDK) inhibitors. In two independent datasets, cancer cells treated with CDK inhibitors have similar gene expression profile changes to the cellular response to HDAC inhibitors. Together, these results led us to hypothesize that VPA and SAHA may interact synergistically with CDK inhibitors such as PD-033299. Experiments show that HDAC and CDK inhibitors have statistically significant synergy in both breast cancer cell lines and primary 3-dimensional cultures of cells from pleural effusions of patients. Therefore, synergistic relationships between HDAC and CDK inhibitors may provide an effective combinatorial regimen for breast cancer. Importantly, these studies provide an example of how genomic analysis of drug-response profiles can be used to design rational drug combinations for cancer treatment.The Pharmacogenomics Journal advance online publication, 15 November 2011; doi:10.1038/tpj.2011.48.
Full-text · Article · Nov 2011 · The Pharmacogenomics Journal
[Show abstract][Hide abstract] ABSTRACT: Unlike traditional chemotherapy, targeted cancer therapies are expected to work in only a subset of people with a particular cancer. However, biomarkers of response are not always known before clinical trial initiation. We present MATCH (Merging genomic and pharmacologic Analyses for Therapy CHoice), an algorithm for using genome-wide gene expression data to identify and validate a genomic biomarker of sensitivity (see Figure 1). Our proof-of-principle example is valproic acid (VPA), but we also show that an estrogen blocking drug currently used for breast cancer and a B-RAF inhibitor in trials for melanoma give predictions that correspond to their clinical uses.
We use genome-wide gene expression data from treated and untreated samples from the Connectivity Map to generate a VPA response signature. We validate that the VPA signature can identify treated and untreated cells in an independent data set of normal cells and in independent samples from the Connectivity Map. The AUC for the ROC curve is 0.86. We then apply the VPA signature to publically available data sets from a panel of cancer cell lines and from primary tumor and normal tissue samples. These data suggest that there is a subset of women with breast cancer who will be sensitive to VPA. Finally, we validate that our predictions correlate with sensitivity to VPA in breast cancer cell lines grown in two-dimensional culture, primary breast tumor samples grown in three-dimensional culture, and in vivo mouse breast cancer xenografts. Together, these studies show that MATCH can identify cancer patients most likely to respond to a specific drug treatment.
Full-text · Article · Jul 2011 · Molecular Systems Biology
[Show abstract][Hide abstract] ABSTRACT: To the Editor: We would like to retract our article, "A Genomic Strategy to Refine Prognosis in Early-Stage Non-Small-Cell Lung Cancer,"(1) which was published in the Journal on August 10, 2006. Using a sample set from a study by the American College of Surgeons Oncology Group (ACOSOG) and a collection of samples from a study by the Cancer and Leukemia Group B (CALGB), we have tried and failed to reproduce results supporting the validation of the lung metagene model described in the article. We deeply regret the effect of this action on the work of other investigators.
No preview · Article · Mar 2011 · New England Journal of Medicine