BACOM: in silico detection of genomic deletion types and correction of normal cell contamination in copy number data.

Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA.
Bioinformatics (Impact Factor: 5.47). 06/2011; 27(11):1473-80. DOI: 10.1093/bioinformatics/btr183
Source: PubMed

ABSTRACT Identification of somatic DNA copy number alterations (CNAs) and significant consensus events (SCEs) in cancer genomes is a main task in discovering potential cancer-driving genes such as oncogenes and tumor suppressors. The recent development of SNP array technology has facilitated studies on copy number changes at a genome-wide scale with high resolution. However, existing copy number analysis methods are oblivious to normal cell contamination and cannot distinguish between contributions of cancerous and normal cells to the measured copy number signals. This contamination could significantly confound downstream analysis of CNAs and affect the power to detect SCEs in clinical samples.
We report here a statistically principled in silico approach, Bayesian Analysis of COpy number Mixtures (BACOM), to accurately estimate genomic deletion type and normal tissue contamination, and accordingly recover the true copy number profile in cancer cells. We tested the proposed method on two simulated datasets, two prostate cancer datasets and The Cancer Genome Atlas high-grade ovarian dataset, and obtained very promising results supported by the ground truth and biological plausibility. Moreover, based on a large number of comparative simulation studies, the proposed method gives significantly improved power to detect SCEs after in silico correction of normal tissue contamination. We develop a cross-platform open-source Java application that implements the whole pipeline of copy number analysis of heterogeneous cancer tissues including relevant processing steps. We also provide an R interface, bacomR, for running BACOM within the R environment, making it straightforward to include in existing data pipelines.
The cross-platform, stand-alone Java application, BACOM, the R interface, bacomR, all source code and the simulation data used in this article are freely available at authors' web site:

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We propose a statistical algorithm MethylPurify that uses regions with bisulfite reads showing discordant methylation levels to infer tumor purity from tumor samples alone. MethylPurify can identify differentially methylated regions (DMRs) from individual tumor methylome samples, without genomic variation information or prior knowledge from other datasets. In simulations with mixed bisulfite reads from cancer and normal cell lines, MethylPurify correctly inferred tumor purity and identified over 96% of the DMRs. From patient data, MethylPurify gave satisfactory DMR calls from tumor methylome samples alone, and revealed potential missed DMRs by tumor to normal comparison due to tumor heterogeneity.
    Genome biology. 08/2014; 15(8):419.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Detection and quantification of the absolute DNA copy number alterations (CNAs) in tumor cells is challenging because the DNA specimen is extracted from a mixture of tumor and normal stromal cells. Estimates of tumor purity and ploidy are necessary to correctly infer copy number, and ploidy may itself be a prognostic factor in cancer progression. As deep sequencing of the exome or genome has become routine for characterization of tumor samples, in this work we aim to develop a simple and robust algorithm to infer purity, ploidy and absolute copy numbers in whole numbers for tumor cells from sequencing data. A simulation study shows that estimates have reasonable accuracy, and that the algorithm is robust against the presence of segmentation errors and subclonal populations. We validated our algorithm against a panel of cell lines with experimentally determined ploidy. We also compared our algorithm to the well established SNP array based method called ABSOLUTE on three sets of tumors of different types. Our method had good performance on these four benchmark datasets for both purity and ploidy estimates, and may offer a simple solution to CNA quantification for cancer sequencing projects. Availability: The R package absCNseq is available from,
    Bioinformatics 01/2014; · 5.47 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Accurate identification of significant aberrations in cancers (AISAIC) is a systematic effort to discover potential cancer-driving genes such as oncogenes and tumor suppressors. Two major confounding factors against this goal are the normal cell contamination and random background aberrations in tumor samples. We describe a Java AISAIC package that provides comprehensive analytic functions and graphic user interface for integrating two statistically-principled in silico approaches to address the aforementioned challenges in DNA copy number analyses. In addition, the package provides a command-line interface for users with scripting and programming needs to incorporate or extend AISAIC to their customized analysis pipelines. This open source multiplatform software offers several attractive features: (1) it implements a user friendly complete pipeline from processing raw data to reporting analytic results; (2) it detects deletion types directly from copy number signals using a Bayes hypothesis test; (3) it estimates the fraction of normal contamination for each sample; (4) it produces unbiased null distribution of random background alterations by iterative aberration-exclusive permutations; and (5) it identifies significant consensus regions and the percentage of homozygous/hemizygous deletions across multiple samples. AISAIC also provides users with a parallel computing option to leverage ubiquitous multi-core machines. AISAIC is available as a Java application, with a user's guide and source code, at Available at Bioinformatics online.
    Bioinformatics 11/2013; · 5.47 Impact Factor


Available from