Robust unmixing of tumor states in array comparative genomic hybridization data

Computer Science Department, Carnegie Mellon University, Pittsburgh PA 15213, USA.
Bioinformatics (Impact Factor: 4.98). 06/2010; 26(12):i106-14. DOI: 10.1093/bioinformatics/btq213
Source: PubMed


Tumorigenesis is an evolutionary process by which tumor cells acquire sequences of mutations leading to increased growth, invasiveness and eventually metastasis. It is hoped that by identifying the common patterns of mutations underlying major cancer sub-types, we can better understand the molecular basis of tumor development and identify new diagnostics and therapeutic targets. This goal has motivated several attempts to apply evolutionary tree reconstruction methods to assays of tumor state. Inference of tumor evolution is in principle aided by the fact that tumors are heterogeneous, retaining remnant populations of different stages along their development along with contaminating healthy cell populations. In practice, though, this heterogeneity complicates interpretation of tumor data because distinct cell types are conflated by common methods for assaying the tumor state. We previously proposed a method to computationally infer cell populations from measures of tumor-wide gene expression through a geometric interpretation of mixture type separation, but this approach deals poorly with noisy and outlier data.
In the present work, we propose a new method to perform tumor mixture separation efficiently and robustly to an experimental error. The method builds on the prior geometric approach but uses a novel objective function allowing for robust fits that greatly reduces the sensitivity to noise and outliers. We further develop an efficient gradient optimization method to optimize this 'soft geometric unmixing' objective for measurements of tumor DNA copy numbers assessed by array comparative genomic hybridization (aCGH) data. We show, on a combination of semi-synthetic and real data, that the method yields fast and accurate separation of tumor states.
We have shown a novel objective function and optimization method for the robust separation of tumor sub-types from aCGH data and have shown that the method provides fast, accurate reconstruction of tumor states from mixed samples. Better solutions to this problem can be expected to improve our ability to accurately identify genetic abnormalities in primary tumor samples and to infer patterns of tumor evolution.
Supplementary data are available at Bioinformatics online.

Download full-text


Available from: Stanley E Shackney, Jul 10, 2014
  • Source
    • "The primary results below are based on components previously determined in Tolliver et al. [17] by the PCAbased method, although the improved method is applied to develop components from simulated data and from a secondary breast cancer data set to provide additional points of comparison. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Tumorigenesis can in principle result from many combinations of mutations, but only a few roughly equivalent sequences of mutations, or "progression pathways," seem to account for most human tumors. Phylogenetics provides a promising way to identify common progression pathways and markers of those pathways. This approach, however, can be confounded by the high heterogeneity within and between tumors, which makes it difficult to identify conserved progression stages or organize them into robust progression pathways. To tackle this problem, we previously developed methods for inferring progression stages from heterogeneous tumor profiles through computational unmixing. In this paper, we develop a novel pipeline for building trees of tumor evolution from the unmixed tumor data. The pipeline implements a statistical approach for identifying robust progression markers from unmixed tumor data and calling those markers in inferred cell states. The result is a set of phylogenetic characters and their assignments in progression states to which we apply maximum parsimony phylogenetic inference to infer tumor progression pathways. We demonstrate the full pipeline on simulated and real comparative genomic hybridization (CGH) data, validating its effectiveness and making novel predictions of major progression pathways and ancestral cell states in breast cancers.
    BioMed Research International 05/2012; 2012(1):797812. DOI:10.1155/2012/797812 · 2.71 Impact Factor

  • [Show abstract] [Hide abstract]
    ABSTRACT: Phylogenetics, or the inference of evolutionary trees, is one of the oldest and most intensively studied topics in computational biology. Yet it remains a vibrant area of research, in part because advances in our ability to gather data for phylogenetic inference continue to create novel and more challenging variants of the phylogeny problem. In this talk, I will discuss a particular challenge underlying some important phylogenetic problems in the genomic era: reconstructing evolutionary histories from samples of heterogeneous populations, each of which may contain contributions from multiple evolutionary stages or pathways.
    Bioinformatics Research and Applications - 7th International Symposium, ISBRA 2011, Changsha, China, May 27-29, 2011. Proceedings; 01/2011
Show more