Article

BACOM: in silico detection of genomic deletion types and correction of normal cell contamination in copy number data.

Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA.
Bioinformatics (Impact Factor: 5.47). 06/2011; 27(11):1473-80. DOI: 10.1093/bioinformatics/btr183
Source: PubMed

ABSTRACT Identification of somatic DNA copy number alterations (CNAs) and significant consensus events (SCEs) in cancer genomes is a main task in discovering potential cancer-driving genes such as oncogenes and tumor suppressors. The recent development of SNP array technology has facilitated studies on copy number changes at a genome-wide scale with high resolution. However, existing copy number analysis methods are oblivious to normal cell contamination and cannot distinguish between contributions of cancerous and normal cells to the measured copy number signals. This contamination could significantly confound downstream analysis of CNAs and affect the power to detect SCEs in clinical samples.
We report here a statistically principled in silico approach, Bayesian Analysis of COpy number Mixtures (BACOM), to accurately estimate genomic deletion type and normal tissue contamination, and accordingly recover the true copy number profile in cancer cells. We tested the proposed method on two simulated datasets, two prostate cancer datasets and The Cancer Genome Atlas high-grade ovarian dataset, and obtained very promising results supported by the ground truth and biological plausibility. Moreover, based on a large number of comparative simulation studies, the proposed method gives significantly improved power to detect SCEs after in silico correction of normal tissue contamination. We develop a cross-platform open-source Java application that implements the whole pipeline of copy number analysis of heterogeneous cancer tissues including relevant processing steps. We also provide an R interface, bacomR, for running BACOM within the R environment, making it straightforward to include in existing data pipelines.
The cross-platform, stand-alone Java application, BACOM, the R interface, bacomR, all source code and the simulation data used in this article are freely available at authors' web site: http://www.cbil.ece.vt.edu/software.htm.

0 Bookmarks
 · 
203 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Detection and quantification of the absolute DNA copy number alterations (CNAs) in tumor cells is challenging because the DNA specimen is extracted from a mixture of tumor and normal stromal cells. Estimates of tumor purity and ploidy are necessary to correctly infer copy number, and ploidy may itself be a prognostic factor in cancer progression. As deep sequencing of the exome or genome has become routine for characterization of tumor samples, in this work we aim to develop a simple and robust algorithm to infer purity, ploidy and absolute copy numbers in whole numbers for tumor cells from sequencing data. A simulation study shows that estimates have reasonable accuracy, and that the algorithm is robust against the presence of segmentation errors and subclonal populations. We validated our algorithm against a panel of cell lines with experimentally determined ploidy. We also compared our algorithm to the well established SNP array based method called ABSOLUTE on three sets of tumors of different types. Our method had good performance on these four benchmark datasets for both purity and ploidy estimates, and may offer a simple solution to CNA quantification for cancer sequencing projects. Availability: The R package absCNseq is available from http://biostats.mcc.ucsd.edu/files/absCNseq_1.0.tar.gz. lebao@ucsd.edu, kmesser@ucsd.edu.
    Bioinformatics 01/2014; · 5.47 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Solid tumors are heterogeneous tissues composed of a mixture of cancer and normal cells, which complicates the interpretation of their molecular profiles. Furthermore, tissue architecture is generally not reflected in molecular assays, rendering this rich information underused. To address these challenges, we developed a computational approach based on standard hematoxylin and eosin-stained tissue sections and demonstrated its power in a discovery and validation cohort of 323 and 241 breast tumors, respectively. To deconvolute cellular heterogeneity and detect subtle genomic aberrations, we introduced an algorithm based on tumor cellularity to increase the comparability of copy number profiles between samples. We next devised a predictor for survival in estrogen receptor-negative breast cancer that integrated both image-based and gene expression analyses and significantly outperformed classifiers that use single data types, such as microarray expression signatures. Image processing also allowed us to describe and validate an independent prognostic factor based on quantitative analysis of spatial patterns between stromal cells, which are not detectable by molecular assays. Our quantitative, image-based method could benefit any large-scale cancer study by refining and complementing molecular assays of tumor samples.
    Science translational medicine 10/2012; 4(157):157ra143. · 10.76 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Accurate identification of significant aberrations in cancers (AISAIC) is a systematic effort to discover potential cancer-driving genes such as oncogenes and tumor suppressors. Two major confounding factors against this goal are the normal cell contamination and random background aberrations in tumor samples. We describe a Java AISAIC package that provides comprehensive analytic functions and graphic user interface for integrating two statistically-principled in silico approaches to address the aforementioned challenges in DNA copy number analyses. In addition, the package provides a command-line interface for users with scripting and programming needs to incorporate or extend AISAIC to their customized analysis pipelines. This open source multiplatform software offers several attractive features: (1) it implements a user friendly complete pipeline from processing raw data to reporting analytic results; (2) it detects deletion types directly from copy number signals using a Bayes hypothesis test; (3) it estimates the fraction of normal contamination for each sample; (4) it produces unbiased null distribution of random background alterations by iterative aberration-exclusive permutations; and (5) it identifies significant consensus regions and the percentage of homozygous/hemizygous deletions across multiple samples. AISAIC also provides users with a parallel computing option to leverage ubiquitous multi-core machines. AISAIC is available as a Java application, with a user's guide and source code, at https://code.google.com/p/aisaic/. Available at Bioinformatics online.
    Bioinformatics 11/2013; · 5.47 Impact Factor

Full-text

View
0 Downloads
Available from