[Show abstract][Hide abstract] ABSTRACT: Background:
Prognosis of patients with colorectal cancer liver metastasis (CRCLM) is estimated based on clinicopathological models. Stratifying patients based on tumor biology may have additional value.
Tissue micro-arrays (TMAs), containing resected CRCLM and corresponding primary tumors from a multi-institutional cohort of 507 patients, were immunohistochemically stained for 18 candidate biomarkers. Cross-validated hazard rate ratios (HRRs) for overall survival (OS) and the proportion of HRRs with opposite effect (P(HRR < 1) or P(HRR > 1)) were calculated. A classifier was constructed by classification and regression tree (CART) analysis and its prognostic value determined by permutation analysis. Correlations between protein expression in primary tumor-CRCLM pairs were calculated.
Based on their putative prognostic value, EGFR (P(HRR < 1) = .02), AURKA (P(HRR < 1) = .02), VEGFA (P(HRR < 1) = .02), PTGS2 (P(HRR < 1) = .01), SLC2A1 (P(HRR > 1) < 01), HIF1α (P(HRR > 1) = .06), KCNQ1 (P(HRR > 1) = .09), CEA (P (HRR > 1) = .05) and MMP9 (P(HRR < 1) = .07) were included in the CART analysis (n = 201). The resulting classifier was based on AURKA, PTGS2 and MMP9 expression and was associated with OS (HRR 2.79, p < .001), also after multivariate analysis (HRR 3.57, p < .001). The prognostic value of the biomarker-based classifier was superior to the clinicopathological model (p = .001). Prognostic value was highest for colon cancer patients (HRR 5.71, p < .001) and patients not treated with systemic therapy (HRR 3.48, p < .01). Classification based on protein expression in primary tumors could be based on AURKA expression only (HRR 2.59, p = .04).
A classifier was generated for patients with CRCLM with improved prognostic value compared to the standard clinicopathological prognostic parameters, which may aid selection of patients who may benefit from adjuvant systemic therapy.
[Show abstract][Hide abstract] ABSTRACT: Reconstructing a gene network from high-throughput molecular data is often a
challenging task, as the number of parameters to estimate easily is much larger
than the sample size. A conventional remedy is to regularize or penalize the
model likelihood. In network models, this is often done locally in the
neighbourhood of each node or gene. However, estimation of the many
regularization parameters is often difficult and can result in large
statistical uncertainties. In this paper we propose to combine local
regularization with global shrinkage of the regularization parameters to borrow
strength between genes and improve inference. We employ a simple Bayesian model
with non-sparse, conjugate priors to facilitate the use of fast variational
approximations to posteriors. We discuss empirical Bayes estimation of
hyper-parameters of the priors, and propose a novel approach to rank-based
posterior thresholding. Using extensive model- and data-based simulations, we
demonstrate that the proposed inference strategy outperforms popular (sparse)
methods, yields more stable edges, and is more reproducible.
[Show abstract][Hide abstract] ABSTRACT: MicroRNAs (miRs) have been recognized as promising biomarkers. It is unknown to what extent tumor-derived miRs are differentially expressed between primary colorectal cancers (pCRCs) and metastatic lesions, and to what extent the expression profiles of tumor tissue differ from the surrounding normal tissue. Next-generation sequencing (NGS) of 220 fresh-frozen samples, including paired primary and metastatic tumor tissue and non-tumorous tissue from 38 patients, revealed expression of 2245 known unique mature miRs and 515 novel candidate miRs. Unsupervised clustering of miR expression profiles of pCRC tissue with paired metastases did not separate the two entities, whereas unsupervised clustering of miR expression profiles of pCRC with normal colorectal mucosa demonstrated complete separation of the tumor samples from their paired normal mucosa. Two hundred and twenty-two miRs differentiated both pCRC and metastases from normal tissue samples (false discovery rate (FDR) <0.05). The highest expressed tumor-specific miRs were miR-21 and miR-92a, both previously described to be involved in CRC with potential as circulating biomarker for early detection. Only eight miRs, 0.5% of the analysed miR transcriptome, were differentially expressed between pCRC and the corresponding metastases (FDR <0.1), consisting of five known miRs (miR-320b, miR-320d, miR-3117, miR-1246 and miR-663b) and three novel candidate miRs (chr 1-2552-5p, chr 8-20656-5p and chr 10-25333-3p). These results indicate that previously unrecognized candidate miRs expressed in advanced CRC were identified using NGS. In addition, miR expression profiles of pCRC and metastatic lesions are highly comparable and may be of similar predictive value for prognosis or response to treatment in patients with advanced CRC.
[Show abstract][Hide abstract] ABSTRACT: Introduction:
Survival of patients after resection of colorectal cancer liver metastasis (CRCLM) is 36%-58%. Positron emission tomography (PET) tracers, imaging the expression of prognostic biomarkers, may contribute to assign appropriate management to individual patients. Aurora kinase A (AURKA) expression is associated with survival of patients after CRCLM resection.
We synthesized [(3)H]alisertib and [(11)C]alisertib, starting from [(3)H]methyl nosylate and [(11)C]methyl iodide, respectively. We measured in vitro uptake of [(3)H]alisertib in cancer cells with high (Caco2), moderate (A431, HCT116, SW480) and low (MKN45) AURKA expression, before and after siRNA-mediated AURKA downmodulation, as well as after inhibition of P-glycoprotein (P-gp) activity. We measured in vivo uptake and biodistribution of [(11)C]alisertib in nude mice, xenografted with A431, HCT116 or MKN45 cells, or P-gp knockout mice.
[(3)H]Alisertib was synthesized with an overall yield of 42% and [(11)C]alisertib with an overall yield of 23%±9% (radiochemical purity ≥99%). Uptake of [(3)H]alisertib in Caco2 cells was higher than in A431 cells (P=.02) and higher than in SW480, HCT116 and MKN45 cells (P<.01). Uptake in A431 cells was higher than in SW480, HCT116 and MKN45 cells (P<.01). Downmodulation of AURKA expression reduced [(3)H]alisertib uptake in Caco2 cells (P<.01). P-gp inhibition increased [(3)H]alisertib uptake in Caco2 (P<.01) and MKN45 (P<.01) cells. In vivo stability of [(11)C]alisertib 90min post-injection was 94.7%±1.3% and tumor-to-background ratios were 2.3±0.8 (A431), 1.6±0.5 (HCT116) and 1.9±0.5 (MKN45). In brains of P-gp knockout mice [(11)C]alisertib uptake was increased compared to uptake in wild-type mice (P<.01) CONCLUSIONS: Radiolabeled alisertib can be synthesized and may have potential for the imaging of AURKA, particularly when AURKA expression is high. However, the exact mechanisms underlying alisertib accumulation need further investigation.
Advances in knowledge and implications for patient care:
Radiolabeled alisertib may be used for non-invasively measuring AURKA protein expression and to stratify patients for treatment accordingly.
No preview · Article · Oct 2015 · Nuclear Medicine and Biology
[Show abstract][Hide abstract] ABSTRACT: Background:
Cancer is caused by somatic DNA alterations such as gene point mutations, DNA copy number aberrations (CNA) and structural variants (SVs). Genome-wide analyses of SVs in large sample series with well-documented clinical information are still scarce. Consequently, the impact of SVs on carcinogenesis and patient outcome remains poorly understood. This study aimed to perform a systematic analysis of genes that are affected by CNA-associated chromosomal breaks in colorectal cancer (CRC) and to determine the clinical relevance of recurrent breakpoint genes.
Primary CRC samples of patients with metastatic disease from CAIRO and CAIRO2 clinical trials were previously characterized by array-comparative genomic hybridization. These data were now used to determine the prevalence of CNA-associated chromosomal breaks within genes across 352 CRC samples. In addition, mutation status of the commonly affected APC, TP53, KRAS, PIK3CA, FBXW7, SMAD4, BRAF and NRAS genes was determined for 204 CRC samples by targeted massive parallel sequencing. Clinical relevance was assessed upon stratification of patients based on gene mutations and gene breakpoints that were observed in >3% of CRC cases.
In total, 748 genes were identified that were recurrently affected by chromosomal breaks (FDR <0.1). MACROD2 was affected in 41% of CRC samples and another 169 genes showed breakpoints in >3% of cases, indicating that prevalence of gene breakpoints is comparable to the prevalence of well-known gene point mutations. Patient stratification based on gene breakpoints and point mutations revealed one CRC subtype with very poor prognosis.
We conclude that CNA-associated chromosomal breaks within genes represent a highly prevalent and clinically relevant subset of SVs in CRC.
[Show abstract][Hide abstract] ABSTRACT: Background
In genomics, hierarchical clustering (HC) is a popular method for grouping similar samples based on a distance measure. HC algorithms do not actually create clusters, but compute a hierarchical representation of the data set. Usually, a fixed height on the HC tree is used, and each contiguous branch of samples below that height is considered a separate cluster. Due to the fixed-height cutting, those clusters may not unravel significant functional coherence hidden deeper in the tree. Besides that, most existing approaches do not make use of available clinical information to guide cluster extraction from the HC. Thus, the identified subgroups may be difficult to interpret in relation to that information.ResultsWe develop a novel framework for decomposing the HC tree into clusters by semi-supervised piecewise snipping. The framework, called guided piecewise snipping, utilizes both molecular data and clinical information to decompose the HC tree into clusters. It cuts the given HC tree at variable heights to find a partition (a set of non-overlapping clusters) which does not only represent a structure deemed to underlie the data from which HC tree is derived, but is also maximally consistent with the supplied clinical data. Moreover, the approach does not require the user to specify the number of clusters prior to the analysis. Extensive results on simulated and multiple medical data sets show that our approach consistently produces more meaningful clusters than the standard fixed-height cut and/or non-guided approaches.Conclusions
The guided piecewise snipping approach features several novelties and advantages over existing approaches. The proposed algorithm is generic, and can be combined with other algorithms that operate on detected clusters. This approach represents an advancement in several regards: (1) a piecewise tree snipping framework that efficiently extracts clusters by snipping the HC tree possibly at variable heights while preserving the HC tree structure; (2) a flexible implementation allowing a variety of data types for both building and snipping the HC tree, including patient follow-up data like survival as auxiliary information.The data sets and R code are provided as supplementary files. The proposed method is available from Bioconductor as the R-package HCsnip.
[Show abstract][Hide abstract] ABSTRACT: In order to identify somatic focal copy number aberrations (CNAs) in cancer specimens and to distinguish them from germ-line copy number variations (CNVs), we developed the software package FocalCall. FocalCall enables user-defined size cutoffs to recognize focal aberrations and builds on established array comparative genomic hybridization segmentation and calling algorithms. To distinguish CNAs from CNVs, the algorithm uses matched patient normal signals as references or, if this is not available, a list with known CNVs in a population. Furthermore, FocalCall differentiates between homozygous and heterozygous deletions as well as between gains and amplifications and is applicable to high-resolution array and sequencing data.
Full-text · Article · Dec 2014 · Cancer informatics
[Show abstract][Hide abstract] ABSTRACT: Response to drug therapy in individual colorectal cancer (CRC) patients is associated with tumour biology. Here we describe the genomic landscape of tumour samples of a homogeneous well-annotated series of patients with metastatic CRC (mCRC) of two phase III clinical trials, CAIRO and CAIRO2. DNA copy number aberrations of 349 patients are determined. Within three treatment arms, 194 chromosomal subregions are associated with progression-free survival (PFS; uncorrected single-test P-values <0.005). These subregions are filtered for effect on messenger RNA expression, using an independent data set from The Cancer Genome Atlas which returned 171 genes. Three chromosomal regions are associated with a significant difference in PFS between treatment arms with or without irinotecan. One of these regions, 6q16.1-q21, correlates in vitro with sensitivity to SN-38, the active metabolite of irinotecan. This genomic landscape of mCRC reveals a number of DNA copy number aberrations associated with response to drug therapy.
Full-text · Article · Nov 2014 · Nature Communications
[Show abstract][Hide abstract] ABSTRACT: For many high-dimensional studies, additional information on the variables,
like (genomic) annotation or external p-values, is available. In the context of
binary and continuous prediction, we develop a method for adaptive
group-regularized (logistic) ridge regression, which makes structural use of
such 'co-data'. Here, 'groups' refer to a partition of the variables according
to the co-data. We derive an empirical Bayes estimate of group-specific
penalties, which possesses several nice properties: i) it is analytical; ii) it
adapts to the informativeness of the co-data for the data at hand; iii) only
one global penalty parameter requires tuning by cross-validation. In addition,
the method allows use of multiple types of co-data at little extra
We show that the group-specific penalties may lead to a larger distinction
between 'near-zero' and relatively large regression parameters, which
facilitates post-hoc variable selection. The method, termed 'GRridge', is
implemented in an easy-to-use R-package. It is demonstrated on two cancer
genomics studies, which both concern the discrimination of precancerous
cervical lesions from normal cervix tissues using methylation microarray data.
For both examples, GRridge clearly improves the predictive performance of
ordinary ridge regression. In addition, we show that for the second study the
relatively good predictive performance is maintained when selecting only 35
Preview · Article · Nov 2014 · Statistics in Medicine
[Show abstract][Hide abstract] ABSTRACT: Background
To determine which changes in the host cell genome are crucial for cervical carcinogenesis, a longitudinal in vitro model system of HPV-transformed keratinocytes was profiled in a genome-wide manner. Four cell lines affected with either HPV16 or HPV18 were assayed at 8 sequential time points for gene expression (mRNA) and gene copy number (DNA) using high-resolution microarrays. Available methods for temporal differential expression analysis are not designed for integrative genomic studies.
Here, we present a method that allows for the identification of differential gene expression associated with DNA copy number changes over time. The temporal variation in gene expression is described by a generalized linear mixed model employing low-rank thin-plate splines. Model parameters are estimated with an empirical Bayes procedure, which exploits integrated nested Laplace approximation for fast computation. Iteratively, posteriors of hyperparameters and model parameters are estimated. The empirical Bayes procedure shrinks multiple dispersion-related parameters. Shrinkage leads to more stable estimates of the model parameters, better control of false positives and improvement of reproducibility. In addition, to make estimates of the DNA copy number more stable, model parameters are also estimated in a multivariate way using triplets of features, imposing a spatial prior for the copy number effect.
With the proposed method for analysis of time-course multilevel molecular data, more profound insight may be gained through the identification of temporal differential expression induced by DNA copy number abnormalities. In particular, in the analysis of an integrative oncogenomics study with a time-course set-up our method finds genes previously reported to be involved in cervical carcinogenesis. Furthermore, the proposed method yields improvements in sensitivity, specificity and reproducibility compared to existing methods. Finally, the proposed method is able to handle count (RNAseq) data from time course experiments as is shown on a real data set.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2105-15-327) contains supplementary material, which is available to authorized users.
Full-text · Article · Oct 2014 · BMC Bioinformatics
[Show abstract][Hide abstract] ABSTRACT: Background
The disease course of patients with diffuse low-grade glioma is notoriously unpredictable. Temporal and spatially distinct samples may provide insight into the evolution of clinically relevant copy number aberrations (CNAs). The purpose of this study is to identify CNAs that are indicative of aggressive tumor behaviour and can thereby complement the prognostically favorable 1p/19q co-deletion.ResultsGenome-wide, 50 base pair single-end, sequencing was performed to detect CNAs in a clinically well-characterized cohort of 98 formalin-fixed paraffin-embedded low-grade gliomas. CNAs are correlated with overall survival as an endpoint. Seventy-five additional samples from spatially distinct regions and paired recurrent tumors of the discovery cohort were analysed to interrogate the intratumoral heterogeneity and spatial evolution. Loss of 10q25.2-qter is a frequent subclonal event and significantly correlates with an unfavorable prognosis. A significant correlation is furthermore observed in a validation set of 126 and confirmation set of 184 patients. Loss of 10q25.2-qter arises in a longitudinal manner in paired recurrent tumor specimens, whereas the prognostically favorable 1p/ 19q co-deletion is the only CNA that is stable across spatial regions and recurrent tumors.ConclusionsCNAs in low-grade gliomas display extensive intratumoral heterogeneity. Distal loss of 10q is a late onset event and a marker for reduced overall survival in low-grade glioma patients. Intratumoral heterogeneity and higher frequencies of distal 10q loss in recurrences suggest this event is involved in outgrowth to the recurrent tumor.
[Show abstract][Hide abstract] ABSTRACT: Detection of DNA copy number aberrations by shallow whole-genome sequencing (WGS) faces many challenges including lack of completion and errors in the human reference genome, repetitive sequences, polymorphisms, variable sample quality, and biases in the sequencing procedures. Formalin-fixed paraffin-embedded (FFPE) archival material, the analysis of which is important for studies of cancer, presents particular analytical difficulties due to degradation of the DNA and frequent lack of matched reference samples. We present a robust, cost-effective WGS method for DNA copy number analysis that addresses these challenges more successfully than currently available procedures. In practice, very useful profiles can be obtained with ~0.1x genome coverage. We improve on previous methods by; first, implementing a combined correction for sequence mappability and GC content, and second, applying this procedure to sequence data from the 1000 Genomes Project in order to develop a blacklist of problematic genome regions. A small subset of these blacklisted regions were previously identified by ENCODE, but the vast majority are novel unappreciated problematic regions. Our procedures are implemented in a pipeline called QDNAseq. We have analyzed over 1,000 samples, most of which were obtained from the fixed tissue archives of over 25 institutions. We demonstrate that for most samples our sequencing and analysis procedures yield genome profiles with noise levels near the statistical limit imposed by read counting. The described procedures also provide better correction of artifacts introduced by low DNA quality than prior approaches, and better copy number data than high-resolution microarrays at substantially lower cost.
[Show abstract][Hide abstract] ABSTRACT: Creatine transporter (SLC6A8) deficiency is the most common cause of cerebral creatine syndromes, and is characterized by depletion of creatine in the brain. Manifestations of this X-linked disorder include intellectual disability, speech/language impairment, behavior abnormalities, and seizures. At the moment no effective treatment is available. In order to investigate the molecular pathophysiology of this disorder we performed RNA sequencing on fibroblasts derived from patients. The transcriptomes of fibroblast cells from eight unrelated individuals with SLC6A8 deficiency and three wild type controls were sequenced. SLC6A8 mutations with different effects on the protein product resulted in different gene expression profiles. Differential gene expression analysis followed by gene ontology term enrichment analysis revealed that especially the expression of genes encoding components of the extracellular matrix and cytoskeleton are altered in SLC6A8 deficiency, such as collagens, keratins, integrins, and cadherins. This suggests an important novel role for creatine in the structural development and maintenance of cells. It is likely that the (extracellular) structure of brain cells is also impaired in SLC6A8 deficient patients, and future studies are necessary to confirm this and to reveal the true functions of creatine in the brain. This article is protected by copyright. All rights reserved.
[Show abstract][Hide abstract] ABSTRACT: Complex designs are common in (observational) clinical studies. Sequencing data for such studies are produced more and more often, implying challenges for the analysis, such as excess of zeros, presence of random effects and multi-parameter inference. Moreover, when sample sizes are small, inference is likely to be too liberal when, in a Bayesian setting, applying a non-appropriate prior or to lack power when not carefully borrowing information across features.
We show on microRNA sequencing data from a clinical cancer study how our software ShrinkBayes tackles the aforementioned challenges. In addition, we illustrate its comparatively good performance on multi-parameter inference for groups using a data-based simulation. Finally, in the small sample size setting, we demonstrate its high power and improved FDR estimation by use of Gaussian mixture priors that include a point mass.
ShrinkBayes is a versatile software package for the analysis of count-based sequencing data, which is particularly useful for studies with small sample sizes or complex designs.
[Show abstract][Hide abstract] ABSTRACT: Abstract Through integration of genomic data from multiple sources, we may obtain a more accurate and complete picture of the molecular mechanisms underlying tumorigenesis. We discuss the integration of DNA copy number and mRNA gene expression data from an observational integrative genomics study involving cancer patients. The two molecular levels involved are linked through the central dogma of molecular biology. DNA copy number aberrations abound in the cancer cell. Here we investigate how these aberrations affect gene expression levels within a pathway using observational integrative genomics data of cancer patients. In particular, we aim to identify differential edges between regulatory networks of two groups involving these molecular levels. Motivated by the rate equations, the regulatory mechanism between DNA copy number aberrations and gene expression levels within a pathway is modeled by a simultaneous-equations model, for the one- and two-group case. The latter facilitates the identification of differential interactions between the two groups. Model parameters are estimated by penalized least squares using the lasso (L1) penalty to obtain a sparse pathway topology. Simulations show that the inclusion of DNA copy number data benefits the discovery of gene-gene interactions. In addition, the simulations reveal that cis-effects tend to be over-estimated in a univariate (single gene) analysis. In the application to real data from integrative oncogenomic studies we show that inclusion of prior information on the regulatory network architecture benefits the reproducibility of all edges. Furthermore, analyses of the TP53 and TGFb signaling pathways between ER+ and ER- samples from an integrative genomics breast cancer study identify reproducible differential regulatory patterns that corroborate with existing literature.
No preview · Article · Feb 2014 · Statistical Applications in Genetics and Molecular Biology
[Show abstract][Hide abstract] ABSTRACT: This paper presents the R/Bioconductor package stepwiseCM, which classifies cancer samples using two heterogeneous data sets in an efficient way. The algorithm is able to capture the distinct classification power of two given data types without actually combining them. This package suits for classification problems where two different types of data sets on the same samples are available. One of these data types has measurements on all samples and the other one has measurements on some samples. One is easy to collect and/or relatively cheap (eg, clinical covariates) compared to the latter (high-dimensional data, eg, gene expression). One additional application for which stepwiseCM is proven to be useful as well is the combination of two high-dimensional data types, eg, DNA copy number and mRNA expression. The package includes functions to project the neighborhood information in one data space to the other to determine a potential group of samples that are likely to benefit most by measuring the second type of covariates. The two heterogeneous data spaces are connected by indirect mapping. The crucial difference between the stepwise classification strategy implemented in this package and the existing packages is that our approach aims to be cost-efficient by avoiding measuring additional covariates, which might be expensive or patient-unfriendly, for a potentially large subgroup of individuals. Moreover, in diagnosis for these individuals test, results would be quickly available, which may lead to reduced waiting times and hence lower the patients' distress. The improvement described remedies the key limitations of existing packages, and facilitates the use of the stepwiseCM package in diverse applications.