[Show abstract][Hide abstract] ABSTRACT: There is an urgent need for better therapeutics in head and neck squamous cell cancer (HNSCC) to improve survival and decrease treatment morbidity. Recent advances in high-throughput drug screening techniques and next-generation sequencing have identified new therapeutic targets in other cancer types, but an HNSCC-specific study has not yet been carried out. We have exploited data from two large-scale cell line projects to clearly describe the mutational and copy number status of HNSCC cell lines and identify candidate drugs with elevated efficacy in HNSCC.
The genetic landscape of 42 HNSCC cell lines including mutational and copy number data from studies by Garnett et al., and Barretina et al., were analyzed. Data from Garnett et al. was interrogated for relationships between HNSCC cells versus the entire cell line pool using one- and two-way analyses of variance (ANOVAs). As only seven HNSCC cell lines were tested with drugs by Barretina et al., a similar analysis was not carried out.
Recurrent mutations in human papillomavirus (HPV)-negative patient tumors were confirmed in HNSCC cell lines, however additional, recurrent, cell line-specific mutations were identified. Four drugs, Bosutinib, Docetaxel, BIBW2992, and Gefitinib, were found via multiple-test corrected ANOVA to have lower IC50 values, suggesting higher drug sensitivity, in HNSCC lines versus non-HNSCC lines. Furthermore, the PI3K inhibitor AZD6482 demonstrated significantly higher activity (as measured by the IC50) in HNSCC cell lines harbouring PIK3CA mutations versus those that did not.
HNSCC-specific reanalysis of large-scale drug screening studies has identified candidate drugs that may be of therapeutic benefit and provided insights into strategies to target PIK3CA mutant tumors. PIK3CA mutations may represent a predictive biomarker for response to PI3K inhibitors. A large-scale study focused on HNSCC cell lines and including HPV-positive lines is necessary and has the potential to accelerate the development of improved therapeutics for patients suffering with head and neck cancer. This strategy can potentially be used as a template for drug discovery in any cancer type.
[Show abstract][Hide abstract] ABSTRACT: Microarrays have revolutionized breast cancer (BC) research by enabling studies of gene expression on a transcriptome-wide scale. Recently, RNA-Sequencing (RNA-Seq) has emerged as an alternative for precise readouts of the transcriptome. To date, no study has compared the ability of the two technologies to quantify clinically relevant individual genes and microarray-derived gene expression signatures (GES) in a set of BC samples encompassing the known molecular BC's subtypes. To accomplish this, the RNA from 57 BCs representing the four main molecular subtypes (triple negative, HER2 positive, luminal A, luminal B), was profiled with Affymetrix HG-U133 Plus 2.0 chips and sequenced using the Illumina HiSeq 2000 platform. The correlations of three clinically relevant BC genes, six molecular subtype classifiers, and a selection of 21 GES were evaluated.
16,097 genes common to the two platforms were retained for downstream analysis. Gene-wise comparison of microarray and RNA-Seq data revealed that 52% had a Spearman's correlation coefficient greater than 0.7 with highly correlated genes displaying significantly higher expression levels. We found excellent correlation between microarray and RNA-Seq for the estrogen receptor (ER; rs =0.973; 95%CI: 0.971-0.975), progesterone receptor (PgR; rs =0.95; 0.947-0.954), and human epidermal growth factor receptor 2 (HER2; rs =0.918; 0.912-0.923), while a few discordances between ER and PgR quantified by immunohistochemistry and RNA-Seq/microarray were observed. All the subtype classifiers evaluated agreed well (Cohen's kappa coefficients >0.8) and all the proliferation-based GES showed excellent Spearman correlations between microarray and RNA-Seq (all rs >0.965). Immune-, stroma- and pathway-based GES showed a lower correlation relative to prognostic signatures (all rs >0.6).
To our knowledge, this is the first study to report a systematic comparison of RNA-Seq to microarray for the evaluation of single genes and GES clinically relevant to BC. According to our results, the vast majority of single gene biomarkers and well-established GES can be reliably evaluated using the RNA-Seq technology.
[Show abstract][Hide abstract] ABSTRACT: In this paper, we shed light on approaches that are currently used to infer networks from gene expression data with respect to their biological meaning. As we will show, the biological interpretation of these networks depends on the chosen theoretical perspective. For this reason, we distinguish a statistical perspective from a mathematical modeling perspective and elaborate their differences and implications. Our results indicate the imperative need for a genomic network ontology in order to avoid increasing confusion about the biological interpretation of inferred networks, which can be even enhanced by approaches that integrate multiple data sets, respectively, data types.
[Show abstract][Hide abstract] ABSTRACT: Human cancers exhibit strong phenotypic differences that can be visualized noninvasively by medical imaging. Radiomics refers to the comprehensive quantification of tumour phenotypes by applying a large number of quantitative image features. Here we present a radiomic analysis of 440 features quantifying tumour image intensity, shape and texture, which are extracted from computed tomography data of 1,019 patients with lung or head-and-neck cancer. We find that a large number of radiomic features have prognostic power in independent data sets of lung and head-and-neck cancer patients, many of which were not identified as significant before. Radiogenomics analysis reveals that a prognostic radiomic signature, capturing intratumour heterogeneity, is associated with underlying gene-expression patterns. These data suggest that radiomics identifies a general prognostic phenotype existing in both lung and head-and-neck cancer. This may have a clinical impact as imaging is routinely used in clinical practice, providing an unprecedented opportunity to improve decision-support in cancer treatment at low cost.
[Show abstract][Hide abstract] ABSTRACT: Cancer is a complex disease that has proven to be difficult to understand on the single-gene level. For this reason a functional elucidation needs to take interactions among genes on a systems-level into account. In this study, we infer a colon cancer network from a large-scale gene expression data set by using the method BC3Net. We provide a structural and a functional analysis of this network and also connect its molecular interaction structure with the chromosomal locations of the genes enabling the definition of cis- and trans-interactions. Furthermore, we investigate the interaction of genes that can be found in close neighborhoods on the chromosomes to gain insight into regulatory mechanisms. To our knowledge this is the first study analyzing the genome-scale colon cancer network.
[Show abstract][Hide abstract] ABSTRACT: Although many methods have been developed for inference of biological networks, the validation of the resulting models has largely remained an unsolved problem. Here we present a framework for quantitative assessment of inferred gene interaction networks using knock-down data from cell line experiments. Using this framework we are able to show that network inference based on integration of prior knowledge derived from the biomedical literature with genomic data significantly improves the quality of inferred networks relative to other approaches. Our results also suggest that cell line experiments can be used to quantitatively assess the quality of networks inferred from tumor samples.
[Show abstract][Hide abstract] ABSTRACT: Ovarian cancer is the fifth most common cause of cancer deaths in women in the United States. Numerous gene signatures of patient prognosis have been proposed, but diverse data and methods make these difficult to compare or use in a clinically meaningful way. We sought to identify successful published prognostic gene signatures through systematic validation using public data.
A systematic review identified 14 prognostic models for late-stage ovarian cancer. For each, we evaluated its 1) reimplementation as described by the original study, 2) performance for prognosis of overall survival in independent data, and 3) performance compared with random gene signatures. We compared and ranked models by validation in 10 published datasets comprising 1251 primarily high-grade, late-stage serous ovarian cancer patients. All tests of statistical significance were two-sided.
Twelve published models had 95% confidence intervals of the C-index that did not include the null value of 0.5; eight outperformed 97.5% of signatures including the same number of randomly selected genes and trained on the same data. The four top-ranked models achieved overall validation C-indices of 0.56 to 0.60 and shared anticorrelation with expression of immune response pathways. Most models demonstrated lower accuracy in new datasets than in validation sets presented in their publication.
This analysis provides definitive support for a handful of prognostic models but also confirms that these require improvement to be of clinical value. This work addresses outstanding controversies in the ovarian cancer literature and provides a reproducible framework for meta-analytic evaluation of gene signatures.
[Show abstract][Hide abstract] ABSTRACT: Our previous studies revealed an increase in alternative splicing (AS) of multiple RNAs in leukemic cells from patients with acute myeloid leukemia (AML) compared to CD34+ bone marrow cells from normal donors (NDs). Aberrantly spliced genes included a number of oncogenes, tumor suppressor genes, and genes involved in regulation of apoptosis, cell cycle, and cell differentiation. Among the most commonly mis-spliced genes (> 70% of AML patients) were two, NOTCH2 and FLT3, genes that encode myeloid cell surface proteins. The splice-variants of NOTCH2 and FLT3 resulted from complete or partial exon skipping and utilization of cryptic splice sites. Longitudinal analyses suggested that aberrant splicing of NOTCH2 and FLT3 correlated with disease status. Correlation analyses between splice-variants of these genes and clinical features of patients showed an association between NOTCH2 splice-variants and overall survival of patients. Our results suggest that NOTCH2 and FLT3 mis-splicing is a common characteristic of AML and has the potential to generate transcripts encoding proteins with altered function. Thus, splice-variants of these genes might provide disease markers and targets for novel therapeutics.
[Show abstract][Hide abstract] ABSTRACT: Recent technologies have made it cost-effective to collect diverse types of genome-wide data. Computational methods are needed to combine these data to create a comprehensive view of a given disease or a biological process. Similarity network fusion (SNF) solves this problem by constructing networks of samples (e.g., patients) for each available data type and then efficiently fusing these into one network that represents the full spectrum of underlying data. For example, to create a comprehensive view of a disease given a cohort of patients, SNF computes and fuses patient similarity networks obtained from each of their data types separately, taking advantage of the complementarity in the data. We used SNF to combine mRNA expression, DNA methylation and microRNA (miRNA) expression data for five cancer data sets. SNF substantially outperforms single data type analysis and established integrative approaches when identifying cancer subtypes and is effective for predicting survival.
[Show abstract][Hide abstract] ABSTRACT: Due to advances in the acquisition and analysis of medical imaging, it is currently possible to quantify the tumor phenotype. The emerging field of Radiomics addresses this issue by converting medical images into minable data by extracting a large number of quantitative imaging features. One of the main challenges of Radiomics is tumor segmentation. Where manual delineation is time consuming and prone to inter-observer variability, it has been shown that semi-automated approaches are fast and reduce inter-observer variability. In this study, a semiautomatic region growing volumetric segmentation algorithm, implemented in the free and publicly available 3D-Slicer platform, was investigated in terms of its robustness for quantitative imaging feature extraction. Fifty-six 3D-radiomic features, quantifying phenotypic differences based on tumor intensity, shape and texture, were extracted from the computed tomography images of twenty lung cancer patients. These radiomic features were derived from the 3D-tumor volumes defined by three independent observers twice using 3D-Slicer, and compared to manual slice-by-slice delineations of five independent physicians in terms of intra-class correlation coefficient (ICC) and feature range. Radiomic features extracted from 3D-Slicer segmentations had significantly higher reproducibility (ICC = 0.85±0.15, p = 0.0009) compared to the features extracted from the manual segmentations (ICC = 0.77±0.17). Furthermore, we found that features extracted from 3D-Slicer segmentations were more robust, as the range was significantly smaller across observers (p = 3.819e-07), and overlapping with the feature ranges extracted from manual contouring (boundary lower: p = 0.007, higher: p = 5.863e-06). Our results show that 3D-Slicer segmented tumor volumes provide a better alternative to the manual delineation for feature quantification, as they yield more reproducible imaging descriptors. Therefore, 3D-Slicer can be employed for quantitative image feature extraction and image data mining research in large patient cohorts.
PLoS ONE 01/2014; 9(7):e102107. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In this study, we infer the breast cancer gene regulatory network from gene expression data. This network is obtained from the application of the BC3Net inference algorithm to a large-scale gene expression data set consisting of 351 patient samples. In order to elucidate the functional relevance of the inferred network, we are performing a Gene Ontology (GO) analysis for its structural components. Our analysis reveals that most significant GO-terms we find for the breast cancer network represent functional modules of biological processes that are described by known cancer hallmarks, including translation, immune response, cell cycle, organelle fission, mitosis, cell adhesion, RNA processing, RNA splicing and response to wounding. Furthermore, by using a curated list of census cancer genes, we find an enrichment in these functional modules. Finally, we study cooperative effects of chromosomes based on information of interacting genes in the beast cancer network. We find that chromosome 21 is most coactive with other chromosomes. To our knowledge this is the first study investigating the genome-scale breast cancer network.
[Show abstract][Hide abstract] ABSTRACT: Gene set enrichment analysis (GSEA) associates gene sets and phenotypes, its use is predicated on the choice of a pre-defined collection of sets. The defacto standard implementation of GSEA provides seven collections yet there are no guidelines for the choice of collections and the impact of such choice, if any, is unknown. Here we compare each of the standard gene set collections in the context of a large dataset of drug response in human cancer cell lines. We define and test a new collection based on gene co-expression in cancer cell lines to compare the performance of the standard collections to an externally derived cell line based collection. The results show that GSEA findings vary significantly depending on the collection chosen for analysis. Henceforth, collections should be carefully selected and reported in studies that leverage GSEA.
[Show abstract][Hide abstract] ABSTRACT: When inferring networks from high-throughput genomic data, one of the main challenges is the subsequent validation of these networks. In the best case scenario, the true network is partially known from previous research results published in structured databases or research articles. Traditionally, inferred networks are validated against these known interactions. Whenever the recovery rate is gauged to be high enough, subsequent high scoring but unknown inferred interactions are deemed good candidates for further experimental validation. Therefore such validation framework strongly depends on the quantity and quality of published interactions and presents serious pitfalls: (1) availability of these known interactions for the studied problem might be sparse; (2) quantitatively comparing different inference algorithms is not trivial; and (3) the use of these known interactions for validation prevents their integration in the inference procedure. The latter is particularly relevant as it has recently been showed that integration of priors during network inference significantly improves the quality of inferred networks. To overcome these problems when validating inferred networks, we recently proposed a data-driven validation framework based on single gene knock-down experiments. Using this framework, we were able to demonstrate the benefits of integrating prior knowledge and expression data. In this paper we used this framework to assess the quality of different sources of prior knowledge on their own and in combination with different genomic data sets in colorectal cancer. We observed that most prior sources lead to significant F-scores. Furthermore, their integration with genomic data leads to a significant increase in F-scores, especially for priors extracted from full text PubMed articles, known co-expression modules and genetic interactions. Lastly, we observed that the results are consistent for three different data sets: experimental knock-down data and two human tumor data sets.
[Show abstract][Hide abstract] ABSTRACT: Validated biomarkers predictive of response/resistance to anthracyclines in breast cancer are currently lacking. The neoadjuvant Trial of Principle (TOP) study, in which patients with estrogen receptor (ER)–negative tumors were treated with anthracycline (epirubicin) monotherapy, was specifically designed to evaluate the predictive value of topoisomerase II-alpha (TOP2A) and develop a gene expression signature to identify those patients who do not benefit from anthracyclines. Here we describe in details the contents and quality controls for the gene expression and clinical data associated with the study published by Desmedt and colleagues in the Journal of Clinical Oncology in 2011 (Desmedt et al., 2011). We also provide R code to easily access the data and perform the quality controls and basic analyses relevant to this dataset.
[Show abstract][Hide abstract] ABSTRACT: Two large-scale pharmacogenomic studies were published recently in this journal. Genomic data are well correlated between studies; however, the measured drug response data are highly discordant. Although the source of inconsistencies remains uncertain, it has potential implications for using these outcome measures to assess gene-drug associations or select potential anticancer drugs on the basis of their reported results.
[Show abstract][Hide abstract] ABSTRACT: Despite new treatments, acute myeloid leukemia (AML) remains an incurable disease. More effective drug design requires an expanded view of the molecular complexity that underlies AML. Alternative splicing (AS) of RNA is used by normal cells to generate protein diversity. Growing evidence indicates that aberrant splicing of genes plays a key role in cancer. We investigated genome-wide splicing abnormalities in AML and based on these abnormalities we aimed to identify novel potential biomarkers and therapeutic targets.
We used genome-wide AS screening to investigate AS abnormalities in two independent AML patient cohorts (DFCI and UHN) and normal donors (NDs). Selected splicing events were confirmed through cloning and sequencing analysis, and than validated in 193 AML patients.
Our results show that ~29% of expressed genes genome-wide were differentially and recurrently spliced in AML patients compared to NDs bone marrow CD34+ cells. Results were reproducible in two independent AML cohorts. In both cohorts, annotation analyses indicated similar proportions of differentially spliced genes encoding several oncogenes, tumor suppressor proteins, splicing factors and heterogeneous-nuclear-ribonucleoproteins, proteins involved in apoptosis, cell proliferation, and spliceosome assembly. Our findings are consistent with reports for other malignances and indicate that AML-specific aberrations in splicing mechanisms are hallmark of AML pathogenesis.
Overall, our results suggest that aberrant splicing is a common characteristic for AML. Our findings also suggest that splice variant transcripts that are the result of splicing aberrations create novel disease markers and provide potential targets for small molecules or antibody therapeutics for this disease.
Clinical Cancer Research 11/2013; · 7.84 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Diabetes risk increases significantly with age and correlates with lower oxidative capacity in muscle. Decreased expression of peroxisome proliferator-activated receptor-gamma coactivator-1alpha (Pgc-1α) and target gene pathways involved in mitochondrial oxidative phosphorylation are associated with muscle insulin resistance, but a causative role has not been established. We sought to determine whether a decline in Pgc-1α and oxidative gene expression occurs during aging and potentiates the development of age-associated insulin resistance. Muscle-specific Pgc-1α knock-out (MKO) mice and wild-type littermate controls were aged for two years. Genetic signatures of skeletal muscle (microarray and mRNA expression) and metabolic profiles (glucose homeostasis, mitochondrial metabolism, body composition, lipids, and indirect calorimetry) of mice were compared at 3, 12, and 24 months of age. Microarray and gene set enrichment analysis highlighted decreased function of the electron transport chain as characteristic of both aging muscle and loss of Pgc-1α expression. Despite significant reductions in oxidative gene expression and succinate dehydrogenase activity, young mice lacking Pgc-1α in muscle had lower fasting glucose and insulin. Consistent with loss of oxidative capacity during aging, Pgc-1α and Pgc-1β expression were reduced in aged wild-type mouse muscle. Interestingly, the combination of age and loss of muscle Pgc-1α expression impaired glucose tolerance and led to increased fat mass, insulin resistance, and inflammatory markers in white adipose and liver tissues. Therefore, loss of Pgc-1α expression and decreased mitochondrial oxidative capacity contributes to worsening glucose tolerance and chronic systemic inflammation associated with aging.
AJP Endocrinology and Metabolism 11/2013; · 4.51 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Little is known about the functions of chromosome Y (chrY) genes beyond their effects on sex and reproduction. In hearts, postpubertal testosterone affects the size of cells and the expression of genes differently in male C57BL/6J than in their C57.Y(A) counterparts, where the original chrY has been substituted with that from A/J mice. We further compared the 2 strains to better understand how chrY polymorphisms may affect cardiac properties, the latter being sexually dimorphic but unrelated to sex and reproduction. Genomic regions showing occupancy with androgen receptors (ARs) were identified in adult male hearts from both strains by chromatin immunoprecipitation. AR chromatin immunoprecipitation peaks (showing significant enrichment for consensus AR binding sites) were mostly strain specific. Measurements of anogenital distances in male pups showed that the biologic effects of perinatal androgens were greater in C57BL/6J than in C57.Y(A). Although perinatal endocrine manipulations showed that these differences contributed to the strain-specific differences in the response of adult cardiac cells to testosterone, the amounts of androgens produced by fetal testes were not different in each strain. Nonetheless, chrY polymorphisms associated in newborn pups' hearts with strain-specific differences in genomic regions showing either AR occupancy, accessible chromatin sites, or histone H3K4me3 marks, as well as with differential expression of 2 chrY-encoded histone demethylases. In conclusion, the effects of chrY on adult cardiac phenotypes appeared to result from an interaction of this chromosome with the organizational programming effects exerted by the neonatal testosterone surge and show several characteristics of being mediated by an epigenetic remodeling of chromatin.