Robust unmixing of tumor states in array comparative genomic hybridization data

Computer Science Department, Carnegie Mellon University, Pittsburgh PA 15213, USA.
Bioinformatics (Impact Factor: 4.62). 06/2010; 26(12):i106-14. DOI: 10.1093/bioinformatics/btq213
Source: PubMed

ABSTRACT Tumorigenesis is an evolutionary process by which tumor cells acquire sequences of mutations leading to increased growth, invasiveness and eventually metastasis. It is hoped that by identifying the common patterns of mutations underlying major cancer sub-types, we can better understand the molecular basis of tumor development and identify new diagnostics and therapeutic targets. This goal has motivated several attempts to apply evolutionary tree reconstruction methods to assays of tumor state. Inference of tumor evolution is in principle aided by the fact that tumors are heterogeneous, retaining remnant populations of different stages along their development along with contaminating healthy cell populations. In practice, though, this heterogeneity complicates interpretation of tumor data because distinct cell types are conflated by common methods for assaying the tumor state. We previously proposed a method to computationally infer cell populations from measures of tumor-wide gene expression through a geometric interpretation of mixture type separation, but this approach deals poorly with noisy and outlier data.
In the present work, we propose a new method to perform tumor mixture separation efficiently and robustly to an experimental error. The method builds on the prior geometric approach but uses a novel objective function allowing for robust fits that greatly reduces the sensitivity to noise and outliers. We further develop an efficient gradient optimization method to optimize this 'soft geometric unmixing' objective for measurements of tumor DNA copy numbers assessed by array comparative genomic hybridization (aCGH) data. We show, on a combination of semi-synthetic and real data, that the method yields fast and accurate separation of tumor states.
We have shown a novel objective function and optimization method for the robust separation of tumor sub-types from aCGH data and have shown that the method provides fast, accurate reconstruction of tumor states from mixed samples. Better solutions to this problem can be expected to improve our ability to accurately identify genetic abnormalities in primary tumor samples and to infer patterns of tumor evolution.
Supplementary data are available at Bioinformatics online.

1 Follower
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Several cancer types consist of multiple, genetically and phenotypically distinct subpopulations. The underlying mechanism for this intra-tumoral heterogeneity can be explained by the clonal evolution model, whereby growth advantageous mutations cause the expansion of cancer cell subclones. The recurrent phenotype of many cancers may be a consequence of these coexisting subpopulations responding unequally to therapies. Methods to computationally infer tumor evolution and subpopulation diversity are emerging and they hold the promise to improve the understanding of the genetic and molecular determinants of recurrence. To address cellular subpopulation dynamics within human tumors, we developed a bioinformatic method, EXPANDS. It estimates the proportion of cells harboring specific mutations in a tumor. By modeling cellular frequencies as probability distributions, EXPANDS predicts mutations that accumulate in a cell prior to its clonal expansion. We assessed the performance of EXPANDS on one whole genome sequenced breast cancer and performed SP analyses on 118 Glioblastoma multiforme samples obtained from TCGA. Our results inform about the extent of subclonal diversity in primary GBM, subpopulation dynamics upon recurrence and provide a set of candidate genes mutated in the most well adapted subpopulations. In summary, EXPANDS predicts tumor purity and subclonal composition from sequencing data. EXPANDS is available for download at (matlab version - used in this manuscript) and (R version). ; SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
    Bioinformatics 10/2013; 30(1). DOI:10.1093/bioinformatics/btt622 · 4.62 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Regulated changes in gene expression underlie many biological processes, but globally profiling cell-to-cell variations in transcriptional regulation is problematic when measuring single cells. Transcriptome-wide identification of regulatory heterogeneities can be robustly achieved by randomly collecting small numbers of cells followed by statistical analysis. However, this stochastic-profiling approach blurs out the expression states of the individual cells in each pooled sample. Here, we show that the underlying distribution of single-cell regulatory states can be deconvolved from stochastic-profiling data through maximum-likelihood inference. Guided by the mechanisms of transcriptional regulation, we formulated plausible mixture models for cell-to-cell regulatory heterogeneity and maximized the resulting likelihood functions to infer model parameters. Inferences were validated both computationally and experimentally for different mixture models, which included regulatory states for multicellular function that were occupied by as few as 1 in 40 cells of the population. Importantly, when the method was extended to programs of heterogeneously coexpressed transcripts, we found that population-level inferences were much more accurate with pooled samples than with one-cell samples when the extent of sampling was limited. Our deconvolution method provides a means to quantify the heterogeneous regulation of molecular states efficiently and gain a deeper understanding of the heterogeneous execution of cell decisions.
    Proceedings of the National Academy of Sciences 01/2014; 111(5). DOI:10.1073/pnas.1311647111 · 9.81 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present the Pareto task inference method (ParTI; for inferring biological tasks from high-dimensional biological data. Data are described as a polytope, and features maximally enriched closest to the vertices (or archetypes) allow identification of the tasks the vertices represent. We demonstrate that human breast tumors and mouse tissues are well described by tetrahedrons in gene expression space, with specific tumor types and biological functions enriched at each of the vertices, suggesting four key tasks.
    Nature Methods 01/2015; DOI:10.1038/nmeth.3254 · 25.95 Impact Factor

Full-text (2 Sources)

Available from
Jul 10, 2014

Similar Publications