A novel SNP analysis method to detect copy number alterations with an unbiased reference signal directly from tumor samples

Department of Pharmacology & Chemical Biology, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA.
BMC Medical Genomics (Impact Factor: 2.87). 01/2011; 4(1):14. DOI: 10.1186/1755-8794-4-14
Source: PubMed


Genomic instability in cancer leads to abnormal genome copy number alterations (CNA) as a mechanism underlying tumorigenesis. Using microarrays and other technologies, tumor CNA are detected by comparing tumor sample CN to normal reference sample CN. While advances in microarray technology have improved detection of copy number alterations, the increase in the number of measured signals, noise from array probes, variations in signal-to-noise ratio across batches and disparity across laboratories leads to significant limitations for the accurate identification of CNA regions when comparing tumor and normal samples.
To address these limitations, we designed a novel "Virtual Normal" algorithm (VN), which allowed for construction of an unbiased reference signal directly from test samples within an experiment using any publicly available normal reference set as a baseline thus eliminating the need for an in-lab normal reference set.
The algorithm was tested using an optimal, paired tumor/normal data set as well as previously uncharacterized pediatric malignant gliomas for which a normal reference set was not available. Using Affymetrix 250K Sty microarrays, we demonstrated improved signal-to-noise ratio and detected significant copy number alterations using the VN algorithm that were validated by independent PCR analysis of the target CNA regions.
We developed and validated an algorithm to provide a virtual normal reference signal directly from tumor samples and minimize noise in the derivation of the raw CN signal. The algorithm reduces the variability of assays performed across different reagent and array batches, methods of sample preservation, multiple personnel, and among different laboratories. This approach may be valuable when matched normal samples are unavailable or the paired normal specimens have been subjected to variations in methods of preservation.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Lung cancer is the leading cause of cancer deaths in the world. The most common type of lung cancer is lung adenocarcinoma (AC). The genetic mechanisms of the early stages and lung AC progression steps are poorly understood. There is currently no clinically applicable gene test for the early diagnosis and AC aggressiveness. Among the major reasons for the lack of reliable diagnostic biomarkers are the extraordinary heterogeneity of the cancer cells, complex and poorly understudied interactions of the AC cells with adjacent tissue and immune system, gene variation across patient cohorts, measurement variability, small sample sizes and sub-optimal analytical methods. We suggest that gene expression profiling of the primary tumours and adjacent tissues (PT-AT) handled with a rational statistical and bioinformatics strategy of biomarker prediction and validation could provide significant progress in the identification of clinical biomarkers of AC. To minimise sample-to-sample variability, repeated multivariate measurements in the same object (organ or tissue, e.g. PT-AT in lung) across patients should be designed, but prediction and validation on the genome scale with small sample size is a great methodical challenge. To analyse PT-AT relationships efficiently in the statistical modelling, we propose an Extreme Class Discrimination (ECD) feature selection method that identifies a sub-set of the most discriminative variables (e.g. expressed genes). Our method consists of a paired Cross-normalization (CN) step followed by a modified sign Wilcoxon test with multivariate adjustment carried out for each variable. Using an Affymetrix U133A microarray paired dataset of 27 AC patients, we reviewed the global reprogramming of the transcriptome in human lung AC tissue versus normal lung tissue, which is associated with about 2,300 genes discriminating the tissues with 100% accuracy. Cluster analysis applied to these genes resulted in four distinct gene groups which we classified as associated with (i) up-regulated genes in the mitotic cell cycle lung AC, (ii) silenced/suppressed gene specific for normal lung tissue, (iii) cell communication and cell motility and (iv) the immune system features. The genes related to mutagenesis, specific lung cancers, early stage of AC development, tumour aggressiveness and metabolic pathway alterations and adaptations of cancer cells are strongly enriched in the AC PT-AT discriminative gene set. Two AC diagnostic biomarkers SPP1 and CENPA were successfully validated on RT-RCR tissue array. ECD method was systematically compared to several alternative methods and proved to be of better performance and as well as it was validated by comparison of the predicted gene set with literature meta-signature. We developed a method that identifies and selects highly discriminative variables from high dimensional data spaces of potential biomarkers based on a statistical analysis of paired samples when the number of samples is small. This method provides superior selection in comparison to conventional methods and can be widely used in different applications. Our method revealed at least 23 hundreds patho-biologically essential genes associated with the global transcriptional reprogramming of human lung epithelium cells and lung AC aggressiveness. This gene set includes many previously published AC biomarkers reflecting inherent disease complexity and specifies the mechanisms of carcinogenesis in the lung AC. SPP1, CENPA and many other PT-AT discriminative genes could be considered as the prospective diagnostic and prognostic biomarkers of lung AC.
    Full-text · Article · Nov 2011 · BMC Genomics
  • Source

    Full-text · Chapter · Feb 2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Copy number variant (CNV) analysis was performed on renal cell carcinoma (RCC) specimens (chromophobe, clear cell, oncocytoma, papillary type 1, and papillary type 2) using high-resolution arrays (1.85 million probes). The RCC samples exhibited diverse genomic changes within and across tumor types, ranging from 106 to 2238 CNV segments in a clear-cell specimen and in a papillary type 2 specimen, respectively. Despite this heterogeneity, distinct CNV segments were common within each tumor classification: chromophobe (seven segments), clear cell (three segments), oncocytoma (nine segments), and papillary type 2 (two segments). Shared segments ranged from a 6.1-kb deletion (oncocytomas) to a 208.3-kb deletion (chromophobes). Among common tumor type-specific variations, chromophobes, clear-cell tumors, and oncocytomas were composed exclusively of noncoding DNA. No CNV regions were common to papillary type 1 specimens, although there were 12 amplifications and 12 deletions in five of six samples. Three microRNAs and 12 mRNA genes had a ≥98% coding region contained within CNV regions, including multiple gene families (chromophobe: amylases 1A, 1B, and 1C; oncocytoma: general transcription factors 2H2, 2B, 2C, and 2D). Gene deletions involved in histone modification and chromatin remodeling affected individual subtypes (clear cell: SFMBT and SETD2; papillary type 2: BAZ1A) and the collective RCC group (KDM4C). The genomic amplifications/deletions identified herein represent potential diagnostic and/or prognostic biomarkers.
    Full-text · Article · Apr 2012 · American Journal Of Pathology
Show more