-
[show abstract]
[hide abstract]
ABSTRACT: Germline determinants of gene expression in tumors are infrequently studied due to the complexity of transcript regulation caused by somatically acquired alterations. We performed expression quantitative trait locus (eQTL)-based analyses using the multi-level information provided in The Cancer Genome Atlas (TCGA). Of the factors we measured, cis-acting eQTLs accounted for 1.2% of the total variation of tumor gene expression, while somatic copy-number alteration and CpG methylation accounted for 7.3% and 3.3%, respectively. eQTL analyses of 15 previously reported breast cancer risk loci resulted in the discovery of three variants that are significantly associated with transcript levels (false discovery rate [FDR] < 0.1). Our trans-based analysis identified an additional three risk loci to act through ESR1, MYC, and KLF4. These findings provide a more comprehensive picture of gene expression determinants in breast cancer as well as insights into the underlying biology of breast cancer risk loci.
Cell 01/2013; 152(3):633-41. · 32.40 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Breast cancer genome-wide association studies have pinpointed dozens of variants associated with breast cancer pathogenesis. The majority of risk variants, however, are located outside of known protein-coding regions. Therefore, identifying which genes the risk variants are acting through presents an important challenge. Variants that are associated with mRNA transcript levels are referred to as expression quantitative trait loci (eQTLs). Many studies have demonstrated that eQTL-based strategies provide a direct way to connect a trait-associated locus with its candidate target gene. Performing eQTL-based analyses in human samples is complicated because of the heterogeneous nature of human tissue. We addressed this issue by devising a method to computationally infer the fraction of cell types in normal human breast tissues. We then applied this method to 13 known breast cancer risk loci, which we hypothesized were eQTLs. For each risk locus, we took all known transcripts within a 2 Mb interval and performed an eQTL analysis in 100 reduction mammoplasty cases. A total of 18 significant associations were discovered (eight in the epithelial compartment and 10 in the stromal compartment). This study highlights the ability to perform large-scale eQTL studies in heterogeneous tissues.
Philosophical Transactions of The Royal Society B Biological Sciences 01/2013; 368(1620):20120363. · 6.40 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: During tumor initiation and progression, cancer cells acquire a selective advantage, allowing them to outcompete their normal counterparts. Identification of the genetic changes that underlie these tumor acquired traits can provide deeper insights into the biology of tumorigenesis. Regions of copy number alterations and germline DNA variants are some of the elements subject to selection during tumor evolution. Integrated examination of inherited variation and somatic alterations holds the potential to reveal specific nucleotide alleles that a tumor "prefers" to have amplified. Next-generation sequencing of tumor and matched normal tissues provides a high-resolution platform to identify and analyze such somatic amplicons. Within an amplicon, examination of informative (e.g., heterozygous) sites deviating from a 1:1 ratio may suggest selection of that allele. A naive approach examines the reads for each heterozygous site in isolation; however, this ignores available valuable linkage information across sites. We, therefore, present a novel hidden Markov model-based method-Haplotype Amplification in Tumor Sequences (HATS)-that analyzes tumor and normal sequence data, along with training data for phasing purposes, to infer amplified alleles and haplotypes in regions of copy number gain. Our method is designed to handle rare variants and biases in read data. We assess the performance of HATS using simulated amplified regions generated from varying copy number and coverage levels, followed by amplicons in real data. We demonstrate that HATS infers the amplified alleles more accurately than does the naive approach, especially at low to intermediate coverage levels and in cases (including high coverage) possessing stromal contamination or allelic bias.
Genome Research 11/2011; 22(2):362-74. · 13.61 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Cancer is a disease driven by a combination of inherited risk alleles coupled with the acquisition of somatic mutations, including amplification and deletion of genomic DNA. Potential relationships between the inherited and somatic aspects of the disease have only rarely been examined on a genome-wide level. Applying a novel integrative analysis of SNP and copy number measurements, we queried the tumor and normal-tissue genomes of 178 glioblastoma patients from the Cancer Genome Atlas project for preferentially amplified alleles, under the hypothesis that oncogenic germline variants will be selectively amplified in the tumor environment. Selected alleles are revealed by allelic imbalance in amplification across samples. This general approach is based on genetic principles and provides a method for identifying important tumor-related alleles. We find that SNP alleles that are most significantly overrepresented in amplicons tend to occur in genes involved with regulation of kinase and transferase activity, and many of these genes are known contributors to gliomagenesis. The analysis also implicates variants in synapse genes. By incorporating gene expression data, we demonstrate synergy between preferential allelic amplification and expression in DOCK4 and EGFR. Our results support the notion that combining germline and tumor genetic data can identify regions relevant to cancer biology.
PLoS Genetics 09/2010; 6(9). · 8.69 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: MOTIVATION: Somatic amplification of particular genomic regions and selection of cellular lineages with such amplifications drives tumor development. However, pinpointing genes under such selection has been difficult due to the large span of these regions. Our recently-developed method, the amplification distortion test (ADT), identifies specific nucleotide alleles and haplotypes that confer better survival for tumor cells when somatically amplified. In this work, we focus on evaluating ADT's power to detect such causal variants across a variety of tumor dataset scenarios. RESULTS: Towards this end, we generated multiple parameter-based, synthetic datasets-derived from real data-that contain somatic copy number aberrations (CNAs) of various lengths and frequencies over germline single nucleotide polymorphisms (SNPs) genome-wide. Gold-standard causal sub-regions were assigned within these CNAs, followed by an assessment of ADT's ability to detect these sub-regions. Results indicate that ADT possesses high sensitivity and specificity in large sample sizes across most parameter cases, including those that more closely reflect existing SNP and CNA cancer data.
Bioinformatics 02/2010; 26(4):518-28. · 5.47 Impact Factor