Article

Robust unmixing of tumor states in array comparative genomic hybridization data

Computer Science Department, Carnegie Mellon University, Pittsburgh PA 15213, USA.
Bioinformatics (Impact Factor: 4.62). 06/2010; 26(12):i106-14. DOI: 10.1093/bioinformatics/btq213
Source: PubMed

ABSTRACT Tumorigenesis is an evolutionary process by which tumor cells acquire sequences of mutations leading to increased growth, invasiveness and eventually metastasis. It is hoped that by identifying the common patterns of mutations underlying major cancer sub-types, we can better understand the molecular basis of tumor development and identify new diagnostics and therapeutic targets. This goal has motivated several attempts to apply evolutionary tree reconstruction methods to assays of tumor state. Inference of tumor evolution is in principle aided by the fact that tumors are heterogeneous, retaining remnant populations of different stages along their development along with contaminating healthy cell populations. In practice, though, this heterogeneity complicates interpretation of tumor data because distinct cell types are conflated by common methods for assaying the tumor state. We previously proposed a method to computationally infer cell populations from measures of tumor-wide gene expression through a geometric interpretation of mixture type separation, but this approach deals poorly with noisy and outlier data.
In the present work, we propose a new method to perform tumor mixture separation efficiently and robustly to an experimental error. The method builds on the prior geometric approach but uses a novel objective function allowing for robust fits that greatly reduces the sensitivity to noise and outliers. We further develop an efficient gradient optimization method to optimize this 'soft geometric unmixing' objective for measurements of tumor DNA copy numbers assessed by array comparative genomic hybridization (aCGH) data. We show, on a combination of semi-synthetic and real data, that the method yields fast and accurate separation of tumor states.
We have shown a novel objective function and optimization method for the robust separation of tumor sub-types from aCGH data and have shown that the method provides fast, accurate reconstruction of tumor states from mixed samples. Better solutions to this problem can be expected to improve our ability to accurately identify genetic abnormalities in primary tumor samples and to infer patterns of tumor evolution.
Supplementary data are available at Bioinformatics online.

Download full-text

Full-text

Available from: Stanley E Shackney, Jul 10, 2014
1 Follower
 · 
100 Views
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Tumorigenesis can in principle result from many combinations of mutations, but only a few roughly equivalent sequences of mutations, or "progression pathways," seem to account for most human tumors. Phylogenetics provides a promising way to identify common progression pathways and markers of those pathways. This approach, however, can be confounded by the high heterogeneity within and between tumors, which makes it difficult to identify conserved progression stages or organize them into robust progression pathways. To tackle this problem, we previously developed methods for inferring progression stages from heterogeneous tumor profiles through computational unmixing. In this paper, we develop a novel pipeline for building trees of tumor evolution from the unmixed tumor data. The pipeline implements a statistical approach for identifying robust progression markers from unmixed tumor data and calling those markers in inferred cell states. The result is a set of phylogenetic characters and their assignments in progression states to which we apply maximum parsimony phylogenetic inference to infer tumor progression pathways. We demonstrate the full pipeline on simulated and real comparative genomic hybridization (CGH) data, validating its effectiveness and making novel predictions of major progression pathways and ancestral cell states in breast cancers.
    BioMed Research International 05/2012; 2012:797812. DOI:10.1155/2012/797812 · 2.71 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Array-based genotyping platforms have during recent years been established as a valuable tool for the characterization of genomic alterations in cancer. The analysis of tumor samples, however, presents challenges for data analysis and interpretation. For example, tumor samples are often admixed with nonaberrant cells that define the tumor microenvironment, such as infiltrating lymphocytes and fibroblasts, or vasculature. Furthermore, tumors often comprise subclones harboring divergent aberrations that are acquired subsequent to the tumor-initiating event. The combined analysis of both genotype and copy number status obtained by array-based genotyping platforms provide opportunities to address these challenges. In this chapter, we present the basic principles for current array-based genotyping platforms and how they can be used to infer genotype and copy number for acquired genomic alterations. We describe how these techniques can be used to resolve tumor ploidy, normal cell admixture, and subclonality. We also exemplify how genotyping techniques can be applied in tumor studies to elucidate the hierarchy among tumor clones, and thus, provide means to study clonal expansion and tumor evolution.
    Advances in Cancer Research 01/2011; 112:151-82. DOI:10.1016/B978-0-12-387688-1.00006-5 · 4.26 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Tumour samples containing distinct sub-populations of cancer and normal cells present challenges in the development of reproducible biomarkers, as these biomarkers are based on bulk signals from mixed tumour profiles. ISOpure is the only mRNA computational purification method to date that does not require a paired tumour-normal sample, provides a personalized cancer profile for each patient, and has been tested on clinical data. Replacing mixed tumour profiles with ISOpure-preprocessed cancer profiles led to better prognostic gene signatures for lung and prostate cancer. Results To simplify the integration of ISOpure into standard R-based bioinformatics analysis pipelines, the algorithm has been implemented as an R package. The ISOpureR package performs analogously to the original code in estimating the fraction of cancer cells and the patient cancer mRNA abundance profile from tumour samples in four cancer datasets. Conclusions The ISOpureR package estimates the fraction of cancer cells and personalized patient cancer mRNA abundance profile from a mixed tumour profile. This open-source R implementation enables integration into existing computational pipelines, as well as easy testing, modification and extension of the model. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0597-x) contains supplementary material, which is available to authorized users.
    BMC Bioinformatics 05/2015; 16(1). DOI:10.1186/s12859-015-0597-x · 2.67 Impact Factor