Benjamin Haibe-Kains

University Health Network, Toronto, Ontario, Canada

Are you Benjamin Haibe-Kains?

Claim your profile

Publications (100)587.92 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Large-scale pharmacogenomic high-throughput screening (HTS) studies hold great potential for generating robust genomic predictors of drug response. Two recent large-scale HTS studies have reported results of such screens, revealing several known and novel drug sensitivities and biomarkers. Subsequent evaluation, however, found only moderate interlaboratory concordance in the drug response phenotypes, possibly due to differences in the experimental protocols used in the two studies. This highlights the need for community-wide implementation of standardized assays for measuring drug response phenotypes so that the full potential of HTS is realized. We suggest that the path forward is to establish best practices and standardization of the critical steps in these assays through a collective effort to ensure that the data produced from large-scale screens would not only be of high intrastudy consistency, so that they could be replicated and compared successfully across multiple laboratories. Cancer Res; 74(15); 1-8. ©2014 AACR.
    Cancer research. 07/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Cancer is a complex disease that has proven to be difficult to understand on the single-gene level. For this reason a functional elucidation needs to take interactions among genes on a systems-level into account. In this study, we infer a colon cancer network from a large-scale gene expression data set by using the method BC3Net. We provide a structural and a functional analysis of this network and also connect its molecular interaction structure with the chromosomal locations of the genes enabling the definition of cis- and trans-interactions. Furthermore, we investigate the interaction of genes that can be found in close neighborhoods on the chromosomes to gain insight into regulatory mechanisms. To our knowledge this is the first study analyzing the genome-scale colon cancer network.
    BMC Bioinformatics 05/2014; 15(Suppl 6):S6. · 3.02 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Ovarian cancer is the fifth most common cause of cancer deaths in women in the United States. Numerous gene signatures of patient prognosis have been proposed, but diverse data and methods make these difficult to compare or use in a clinically meaningful way. We sought to identify successful published prognostic gene signatures through systematic validation using public data. A systematic review identified 14 prognostic models for late-stage ovarian cancer. For each, we evaluated its 1) reimplementation as described by the original study, 2) performance for prognosis of overall survival in independent data, and 3) performance compared with random gene signatures. We compared and ranked models by validation in 10 published datasets comprising 1251 primarily high-grade, late-stage serous ovarian cancer patients. All tests of statistical significance were two-sided. Twelve published models had 95% confidence intervals of the C-index that did not include the null value of 0.5; eight outperformed 97.5% of signatures including the same number of randomly selected genes and trained on the same data. The four top-ranked models achieved overall validation C-indices of 0.56 to 0.60 and shared anticorrelation with expression of immune response pathways. Most models demonstrated lower accuracy in new datasets than in validation sets presented in their publication. This analysis provides definitive support for a handful of prognostic models but also confirms that these require improvement to be of clinical value. This work addresses outstanding controversies in the ovarian cancer literature and provides a reproducible framework for meta-analytic evaluation of gene signatures.
    CancerSpectrum Knowledge Environment 04/2014; · 14.07 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Our previous studies revealed an increase in alternative splicing (AS) of multiple RNAs in leukemic cells from patients with acute myeloid leukemia (AML) compared to CD34+ bone marrow cells from normal donors (NDs). Aberrantly spliced genes included a number of oncogenes, tumor suppressor genes, and genes involved in regulation of apoptosis, cell cycle, and cell differentiation. Among the most commonly mis-spliced genes (> 70% of AML patients) were two, NOTCH2 and FLT3, genes that encode myeloid cell surface proteins. The splice-variants of NOTCH2 and FLT3 resulted from complete or partial exon skipping and utilization of cryptic splice sites. Longitudinal analyses suggested that aberrant splicing of NOTCH2 and FLT3 correlated with disease status. Correlation analyses between splice-variants of these genes and clinical features of patients showed an association between NOTCH2 splice-variants and overall survival of patients. Our results suggest that NOTCH2 and FLT3 mis-splicing is a common characteristic of AML and has the potential to generate transcripts encoding proteins with altered function. Thus, splice-variants of these genes might provide disease markers and targets for novel therapeutics.
    Blood 02/2014; · 9.78 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Recent technologies have made it cost-effective to collect diverse types of genome-wide data. Computational methods are needed to combine these data to create a comprehensive view of a given disease or a biological process. Similarity network fusion (SNF) solves this problem by constructing networks of samples (e.g., patients) for each available data type and then efficiently fusing these into one network that represents the full spectrum of underlying data. For example, to create a comprehensive view of a disease given a cohort of patients, SNF computes and fuses patient similarity networks obtained from each of their data types separately, taking advantage of the complementarity in the data. We used SNF to combine mRNA expression, DNA methylation and microRNA (miRNA) expression data for five cancer data sets. SNF substantially outperforms single data type analysis and established integrative approaches when identifying cancer subtypes and is effective for predicting survival.
    Nature Methods 01/2014; · 23.57 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Although many methods have been developed for inference of biological networks, the validation of the resulting models has largely remained an unsolved problem. Here we present a framework for quantitative assessment of inferred gene interaction networks using knock-down data from cell line experiments. Using this framework we are able to show that network inference based on integration of prior knowledge derived from the biomedical literature with genomic data significantly improves the quality of inferred networks relative to other approaches. Our results also suggest that cell line experiments can be used to quantitatively assess the quality of networks inferred from tumor samples.
    Genomics 01/2014; · 3.01 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Due to advances in the acquisition and analysis of medical imaging, it is currently possible to quantify the tumor phenotype. The emerging field of Radiomics addresses this issue by converting medical images into minable data by extracting a large number of quantitative imaging features. One of the main challenges of Radiomics is tumor segmentation. Where manual delineation is time consuming and prone to inter-observer variability, it has been shown that semi-automated approaches are fast and reduce inter-observer variability. In this study, a semiautomatic region growing volumetric segmentation algorithm, implemented in the free and publicly available 3D-Slicer platform, was investigated in terms of its robustness for quantitative imaging feature extraction. Fifty-six 3D-radiomic features, quantifying phenotypic differences based on tumor intensity, shape and texture, were extracted from the computed tomography images of twenty lung cancer patients. These radiomic features were derived from the 3D-tumor volumes defined by three independent observers twice using 3D-Slicer, and compared to manual slice-by-slice delineations of five independent physicians in terms of intra-class correlation coefficient (ICC) and feature range. Radiomic features extracted from 3D-Slicer segmentations had significantly higher reproducibility (ICC = 0.85±0.15, p = 0.0009) compared to the features extracted from the manual segmentations (ICC = 0.77±0.17). Furthermore, we found that features extracted from 3D-Slicer segmentations were more robust, as the range was significantly smaller across observers (p = 3.819e-07), and overlapping with the feature ranges extracted from manual contouring (boundary lower: p = 0.007, higher: p = 5.863e-06). Our results show that 3D-Slicer segmented tumor volumes provide a better alternative to the manual delineation for feature quantification, as they yield more reproducible imaging descriptors. Therefore, 3D-Slicer can be employed for quantitative image feature extraction and image data mining research in large patient cohorts.
    PLoS ONE 01/2014; 9(7):e102107. · 3.53 Impact Factor
  • Source
    Benjamin Haibe-Kains, Frank Emmert-Streib
    Frontiers in Genetics 01/2014; 5:221.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Human cancers exhibit strong phenotypic differences that can be visualized noninvasively by medical imaging. Radiomics refers to the comprehensive quantification of tumour phenotypes by applying a large number of quantitative image features. Here we present a radiomic analysis of 440 features quantifying tumour image intensity, shape and texture, which are extracted from computed tomography data of 1,019 patients with lung or head-and-neck cancer. We find that a large number of radiomic features have prognostic power in independent data sets of lung and head-and-neck cancer patients, many of which were not identified as significant before. Radiogenomics analysis reveals that a prognostic radiomic signature, capturing intratumour heterogeneity, is associated with underlying gene-expression patterns. These data suggest that radiomics identifies a general prognostic phenotype existing in both lung and head-and-neck cancer. This may have a clinical impact as imaging is routinely used in clinical practice, providing an unprecedented opportunity to improve decision-support in cancer treatment at low cost.
    Nature Communications 01/2014; 5:4006. · 10.02 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In this study, we infer the breast cancer gene regulatory network from gene expression data. This network is obtained from the application of the BC3Net inference algorithm to a large-scale gene expression data set consisting of 351 patient samples. In order to elucidate the functional relevance of the inferred network, we are performing a Gene Ontology (GO) analysis for its structural components. Our analysis reveals that most significant GO-terms we find for the breast cancer network represent functional modules of biological processes that are described by known cancer hallmarks, including translation, immune response, cell cycle, organelle fission, mitosis, cell adhesion, RNA processing, RNA splicing and response to wounding. Furthermore, by using a curated list of census cancer genes, we find an enrichment in these functional modules. Finally, we study cooperative effects of chromosomes based on information of interacting genes in the beast cancer network. We find that chromosome 21 is most coactive with other chromosomes. To our knowledge this is the first study investigating the genome-scale breast cancer network.
    Frontiers in Genetics 01/2014; 5:15.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Gene set enrichment analysis (GSEA) associates gene sets and phenotypes, its use is predicated on the choice of a pre-defined collection of sets. The defacto standard implementation of GSEA provides seven collections yet there are no guidelines for the choice of collections and the impact of such choice, if any, is unknown. Here we compare each of the standard gene set collections in the context of a large dataset of drug response in human cancer cell lines. We define and test a new collection based on gene co-expression in cancer cell lines to compare the performance of the standard collections to an externally derived cell line based collection. The results show that GSEA findings vary significantly depending on the collection chosen for analysis. Henceforth, collections should be carefully selected and reported in studies that leverage GSEA.
    Scientific Reports 01/2014; 4:4092. · 5.08 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: When inferring networks from high-throughput genomic data, one of the main challenges is the subsequent validation of these networks. In the best case scenario, the true network is partially known from previous research results published in structured databases or research articles. Traditionally, inferred networks are validated against these known interactions. Whenever the recovery rate is gauged to be high enough, subsequent high scoring but unknown inferred interactions are deemed good candidates for further experimental validation. Therefore such validation framework strongly depends on the quantity and quality of published interactions and presents serious pitfalls: (1) availability of these known interactions for the studied problem might be sparse; (2) quantitatively comparing different inference algorithms is not trivial; and (3) the use of these known interactions for validation prevents their integration in the inference procedure. The latter is particularly relevant as it has recently been showed that integration of priors during network inference significantly improves the quality of inferred networks. To overcome these problems when validating inferred networks, we recently proposed a data-driven validation framework based on single gene knock-down experiments. Using this framework, we were able to demonstrate the benefits of integrating prior knowledge and expression data. In this paper we used this framework to assess the quality of different sources of prior knowledge on their own and in combination with different genomic data sets in colorectal cancer. We observed that most prior sources lead to significant F-scores. Furthermore, their integration with genomic data leads to a significant increase in F-scores, especially for priors extracted from full text PubMed articles, known co-expression modules and genetic interactions. Lastly, we observed that the results are consistent for three different data sets: experimental knock-down data and two human tumor data sets.
    Frontiers in Genetics 01/2014; 5:177.
  • Frank Emmert-Streib, Matthias Dehmer, Benjamin Haibe-Kains
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we shed light on approaches that are currently used to infer networks from gene expression data with respect to their biological meaning. As we will show, the biological interpretation of these networks depends on the chosen theoretical perspective. For this reason, we distinguish a statistical perspective from a mathematical modeling perspective and elaborate their differences and implications. Our results indicate the imperative need for a genomic network ontology in order to avoid increasing confusion about the biological interpretation of inferred networks, which can be even enhanced by approaches that integrate multiple data sets, respectively, data types.
    Frontiers in Genetics 01/2014; 5:299.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Despite new treatments, acute myeloid leukemia (AML) remains an incurable disease. More effective drug design requires an expanded view of the molecular complexity that underlies AML. Alternative splicing (AS) of RNA is used by normal cells to generate protein diversity. Growing evidence indicates that aberrant splicing of genes plays a key role in cancer. We investigated genome-wide splicing abnormalities in AML and based on these abnormalities we aimed to identify novel potential biomarkers and therapeutic targets. We used genome-wide AS screening to investigate AS abnormalities in two independent AML patient cohorts (DFCI and UHN) and normal donors (NDs). Selected splicing events were confirmed through cloning and sequencing analysis, and than validated in 193 AML patients. Our results show that ~29% of expressed genes genome-wide were differentially and recurrently spliced in AML patients compared to NDs bone marrow CD34+ cells. Results were reproducible in two independent AML cohorts. In both cohorts, annotation analyses indicated similar proportions of differentially spliced genes encoding several oncogenes, tumor suppressor proteins, splicing factors and heterogeneous-nuclear-ribonucleoproteins, proteins involved in apoptosis, cell proliferation, and spliceosome assembly. Our findings are consistent with reports for other malignances and indicate that AML-specific aberrations in splicing mechanisms are hallmark of AML pathogenesis. Overall, our results suggest that aberrant splicing is a common characteristic for AML. Our findings also suggest that splice variant transcripts that are the result of splicing aberrations create novel disease markers and provide potential targets for small molecules or antibody therapeutics for this disease.
    Clinical Cancer Research 11/2013; · 7.84 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Two large-scale pharmacogenomic studies were published recently in this journal. Genomic data are well correlated between studies; however, the measured drug response data are highly discordant. Although the source of inconsistencies remains uncertain, it has potential implications for using these outcome measures to assess gene-drug associations or select potential anticancer drugs on the basis of their reported results.
    Nature 11/2013; · 38.60 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Diabetes risk increases significantly with age and correlates with lower oxidative capacity in muscle. Decreased expression of peroxisome proliferator-activated receptor-gamma coactivator-1alpha (Pgc-1α) and target gene pathways involved in mitochondrial oxidative phosphorylation are associated with muscle insulin resistance, but a causative role has not been established. We sought to determine whether a decline in Pgc-1α and oxidative gene expression occurs during aging and potentiates the development of age-associated insulin resistance. Muscle-specific Pgc-1α knock-out (MKO) mice and wild-type littermate controls were aged for two years. Genetic signatures of skeletal muscle (microarray and mRNA expression) and metabolic profiles (glucose homeostasis, mitochondrial metabolism, body composition, lipids, and indirect calorimetry) of mice were compared at 3, 12, and 24 months of age. Microarray and gene set enrichment analysis highlighted decreased function of the electron transport chain as characteristic of both aging muscle and loss of Pgc-1α expression. Despite significant reductions in oxidative gene expression and succinate dehydrogenase activity, young mice lacking Pgc-1α in muscle had lower fasting glucose and insulin. Consistent with loss of oxidative capacity during aging, Pgc-1α and Pgc-1β expression were reduced in aged wild-type mouse muscle. Interestingly, the combination of age and loss of muscle Pgc-1α expression impaired glucose tolerance and led to increased fat mass, insulin resistance, and inflammatory markers in white adipose and liver tissues. Therefore, loss of Pgc-1α expression and decreased mitochondrial oxidative capacity contributes to worsening glucose tolerance and chronic systemic inflammation associated with aging.
    AJP Endocrinology and Metabolism 11/2013; · 4.51 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Little is known about the functions of chromosome Y (chrY) genes beyond their effects on sex and reproduction. In hearts, postpubertal testosterone affects the size of cells and the expression of genes differently in male C57BL/6J than in their C57.Y(A) counterparts, where the original chrY has been substituted with that from A/J mice. We further compared the 2 strains to better understand how chrY polymorphisms may affect cardiac properties, the latter being sexually dimorphic but unrelated to sex and reproduction. Genomic regions showing occupancy with androgen receptors (ARs) were identified in adult male hearts from both strains by chromatin immunoprecipitation. AR chromatin immunoprecipitation peaks (showing significant enrichment for consensus AR binding sites) were mostly strain specific. Measurements of anogenital distances in male pups showed that the biologic effects of perinatal androgens were greater in C57BL/6J than in C57.Y(A). Although perinatal endocrine manipulations showed that these differences contributed to the strain-specific differences in the response of adult cardiac cells to testosterone, the amounts of androgens produced by fetal testes were not different in each strain. Nonetheless, chrY polymorphisms associated in newborn pups' hearts with strain-specific differences in genomic regions showing either AR occupancy, accessible chromatin sites, or histone H3K4me3 marks, as well as with differential expression of 2 chrY-encoded histone demethylases. In conclusion, the effects of chrY on adult cardiac phenotypes appeared to result from an interaction of this chromosome with the organizational programming effects exerted by the neonatal testosterone surge and show several characteristics of being mediated by an epigenetic remodeling of chromatin.
    Endocrinology 10/2013; · 4.72 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We examined if a combination of proliferation markers and estrogen receptor (ER) activity could predict early versus late relapses in ER-positive breast cancer and inform the choice and length of adjuvant endocrine therapy. Baseline affymetrix gene-expression profiles from ER-positive patients who received no systemic therapy (n = 559) or adjuvant tamoxifen for 5 years (cohort-1: n = 683, cohort-2: n = 282) and from 58 patients treated with neoadjuvant letrozole for 3 months (gene-expression available at baseline, 14 and 90 days) were analyzed. A proliferation score based on the expression of mitotic kinases (MKS) and an ER-related score (ERS) adopted from Oncotype DX(R) were calculated. The same analysis was performed using the Genomic Grade Index as proliferation marker and the luminal gene score from the PAM50 classifier as measure of estrogen-related genes. Median values were used to define low and high marker groups and four combinations were created. Relapses were grouped into time cohorts of 0--2.5, 0--5, >5-10 years. In the overall 10 years period, the proportional hazards assumption was violated for several biomarker groups indicating time-dependent effects. In tamoxifen-treated patients Low-MKS/Low-ERS cancers had continuously increasing risk of relapse that was higher after 5 years than Low-MKS/High-ERS cancers [0 to 10 year, HR 3.36; p = 0.013]. High-MKS/High-ERS cancers had low risk of early relapse [0--2.5 years HR 0.13; p = 0.0006], but high risk of late relapse which was higher than in the High-MKS/Low-ERS group [after 5 years HR 3.86; p = 0.007]. The High-MKS/Low-ERS subset had most of the early relapses [0 to 2.5 years, HR 6.53; p < 0.0001] especially in node negative tumors and showed minimal response to neoadjuvant letrozole. These findings were qualitatively confirmed in a smaller independent cohort of tamoxifen-treated patients. Using different biomarkers provided similar results. Early relapses are highest in highly proliferative/low-ERS cancers, in particular in node negative tumors. Relapses occurring after 5 years of adjuvant tamoxifen are highest among the highly-proliferative/high-ERS tumors although their risk of recurrence is modest in the first 5 years on tamoxifen. These tumors could be the best candidates for extended endocrine therapy.
    Breast cancer research: BCR 09/2013; 15(5):R86. · 5.87 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Estrogen receptor (ER) and progesterone receptor (PR) testing are performed in the evaluation of breast cancer. While the clinical utility of ER as a predictive biomarker to identify patients likely to benefit from hormonal therapy is well-established, the added value of PR is less well-defined. The primary goals of our study were to assess the distribution, inter-assay reproducibility, and prognostic significance of breast cancer subtypes defined by patterns of ER and PR expression. We integrated gene expression microarray (GEM) and clinico-pathologic data from 20 published studies to determine the frequency (n = 4,111) and inter-assay reproducibility (n = 1,752) of ER/PR subtypes (ER+/PR+, ER+/PR-neg, ER-neg/PR-neg, ER-neg/PR+). To extend our findings, we utilized a cohort of patients from the Nurses' Health Study (NHS) with ER/PR data recorded in the medical record and assessed on tissue microarrays (n = 2,011). In both data-sets, we assessed the association of ER and PR expression with survival. In a genomewide analysis, PR was among the least variable genes in ER-negative breast cancer. The ER-neg/PR+ subtype was rare (~1-4%) and showed no significant reproducibility (Kappa = 0.02 and 0.06, in the GEM and NHS datasets, respectively). The vast majority of patients classified as ER-neg/PR+ in the medical record (97% and 94%, in the GEM and NHS data-sets) were re-classified by a second method. In the GEM data set (n = 2,731), PR mRNA expression was associated with prognosis in ER+ breast cancer (Adjusted P < 0.001), but not in ER-negative breast cancer (Adjusted P = 0.21). PR protein expression did not contribute significant prognostic information to multivariate models considering ER and other standard clinico-pathologic features in the GEM or NHS datasets. ER-neg/PR+ breast cancer is not a reproducible subtype. PR expression is not associated with prognosis in ER-negative breast cancer, and PR does not contribute significant independent prognostic information to multivariate models considering ER and other standard clinico-pathologic factors. Given that PR provides no clinically actionable information in ER+ breast cancer, these findings question the utility of routine PR testing in breast cancer.
    Breast cancer research: BCR 08/2013; 15(4):R68. · 5.87 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Feature selection is one of the main challenges in analyzing high-throughput genomic data. Minimum redundancy maximum relevance (mRMR) is a particularly fast feature selection method for finding a set of both relevant and complementary features. Here we describe the mRMRe R package, in which the mRMR technique is extended by using an ensemble approach in order to better explore the feature space and build more robust predictors. To deal with the computational complexity of the ensemble approach the main functions of the package are implemented and parallelized in C using the openMP API. Our ensemble mRMR implementations outperform the classical mRMR approach in terms of prediction accuracy. They identify genes more relevant to the biological context and may lead to richer biological interpretations. The parallelized functions included in the package show significant gains in terms of run-time speed when compared to previously released packages. The R package mRMRe is available on CRAN and is provided open source under the Artistic-2.0 License. The code used to generate all the results reported in this application note is available from Supplementary File 1. bhaibeka@ircm.qc.ca SUPPLEMENTARY INFORMATION: Supplementary information is available at Bioinformatics online.
    Bioinformatics 07/2013; · 5.47 Impact Factor

Publication Stats

3k Citations
587.92 Total Impact Points

Institutions

  • 2014
    • University Health Network
      Toronto, Ontario, Canada
  • 2013–2014
    • Université de Montréal
      Montréal, Quebec, Canada
    • Beth Israel Deaconess Medical Center
      Boston, Massachusetts, United States
  • 2011–2014
    • University of Toronto
      • • Department of Medical Biophysics
      • • Department of Medicine
      Toronto, Ontario, Canada
  • 2008–2013
    • Institut Jules Bordet
      Bruxelles, Brussels Capital Region, Belgium
    • Microarrays
      Huntsville, Alabama, United States
  • 2005–2013
    • Université Libre de Bruxelles
      • • Bordet Institute
      • • Laboratory of Experimental Hematology
      Bruxelles, Brussels Capital Region, Belgium
  • 2011–2012
    • Dana-Farber Cancer Institute
      • Department of Biostatistics and Computational Biology
      Boston, MA, United States
  • 2010
    • Harvard Medical School
      Boston, Massachusetts, United States
    • Peter MacCallum Cancer Centre
      • Molecular Oncology Laboratory
      Melbourne, Victoria, Australia
  • 2004–2008
    • Vrije Universiteit Brussel
      Bruxelles, Brussels Capital Region, Belgium