[Show abstract][Hide abstract] ABSTRACT: Asian nonsmoking populations have a higher incidence of lung cancer compared to their European counterparts. There is a long-standing hypothesis that the increase of lung cancer in Asian never-smokers is due to environmental factors such as second-hand smoke. We analyzed whole-genome sequencing of 30 Asian lung cancers. Unsupervised clustering of mutational signatures separated the patients into two categories of either all the never-smokers or all the smokers or ex-smokers. In addition, nearly one-third of the ex-smokers and smokers classified with the never-smoker-like cluster. The somatic variant profiles of Asian lung cancers were similar to that of European origin with G.C>T.A being predominant in smokers. We found EGFR and TP53 to be the most frequently mutated genes with mutations in 50% and 27% of individuals, respectively. Among the 16 never-smokers, 69% had an EGFR mutation compared to 29% of 14 smokers/ex-smokers. Asian never-smokers had lung cancer signatures distinct from the smoker signature and their mutation profiles were similar to European never-smokers. The profiles of Asian and European smokers are also similar. Taken together, these results suggested that the same mutational mechanisms underlie the etiology for both ethnic groups. Thus, the high incidence of lung cancer in Asian never-smokers seems unlikely to be due to second-hand smoke or other carcinogens that cause oxidative DNA damage, implying that routine EGFR testing is warranted in the Asian population regardless of smoking status.
Cancer Research 09/2014; 74(21). DOI:10.1158/0008-5472.CAN-13-3195 · 9.33 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Genome-wide association studies have identified several genomic regions that are associated with stroke risk, but these provide an explanation for only a small fraction of familial stroke aggregation. Genotype by environment interactions may contribute further to such an explanation. The Women's Health Initiative (WHI) clinical trial found increased stroke risk with postmenopausal hormone therapy (HT) and provides an efficient setting for evaluating genotype-HT interaction on stroke risk.
We examined HT by genotype interactions for 392 SNPs selected from candidate gene studies, and 2,371 SNPs associated with changes in blood protein concentrations after hormone therapy, in analyses that included 2,045 postmenopausal women who developed stroke during WHI clinical trial and observational study follow-up and one-to-one matched controls. A two-stage procedure was implemented where SNPs passing the first stage screening based on marginal association with stroke risk were tested in the second stage for interaction with HT using case-only analysis.
The two-stage procedure identified two SNPs, rs2154299 and rs12194855, in the coagulation factor XIII subunit A (F13A1) region and two SNPs, rs630431 and rs560892, in the proprotein convertase subtilisin kexin 9 (PCSK9) region, with an estimated false discovery rate <0.05 based on interaction tests. Further analyses showed significant stroke risk interaction between these F13A1 SNPs and estrogen plus progestin (E+P) treatment for ischemic stroke and for ischemic and hemorrhagic stroke combined, and suggested interactions between PCSK9 SNPs with either E+P or estrogen-alone treatment.
Genotype by environment interaction information may help to define genomic regions relevant to stroke risk. Two-stage analysis among postmenopausal women generates novel hypotheses concerning the F13A1 and PCSK9 genomic regions and the effects of hormonal exposures on postmenopausal stroke risk for subsequent independent validation.
Genome Medicine 07/2012; 4(7):57. DOI:10.1186/gm358 · 5.34 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Hepatitis B virus (HBV) infection is a leading risk factor for hepatocellular carcinoma (HCC). HBV integration into the host genome has been reported, but its scale, impact and contribution to HCC development is not clear. Here, we sequenced the tumor and nontumor genomes (>80× coverage) and transcriptomes of four HCC patients and identified 255 HBV integration sites. Increased sequencing to 240× coverage revealed a proportionally higher number of integration sites. Clonal expansion of HBV-integrated hepatocytes was found specifically in tumor samples. We observe a diverse collection of genomic perturbations near viral integration sites, including direct gene disruption, viral promoter-driven human transcription, viral-human transcript fusion, and DNA copy number alteration. Thus, we report the most comprehensive characterization of HBV integration in hepatocellular carcinoma patients. Such widespread random viral integration will likely increase carcinogenic opportunities in HBV-infected individuals.
Genome Research 02/2012; 22(4):593-601. DOI:10.1101/gr.133926.111 · 14.63 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Unchained base reads on self-assembling DNA nanoarrays have recently emerged as a promising approach to low-cost, high-quality resequencing of human genomes. Because of unique characteristics of these mated pair reads, existing computational methods for resequencing assembly, such as those based on map-consensus calling, are not adequate for accurate variant calling. We describe novel computational methods developed for accurate calling of SNPs and short substitutions and indels (<100 bp); the same methods apply to evaluation of hypothesized larger, structural variations. We use an optimization process that iteratively adjusts the genome sequence to maximize its a posteriori probability given the observed reads. For each candidate sequence, this probability is computed using Bayesian statistics with a simple read generation model and simplifying assumptions that make the problem computationally tractable. The optimization process iteratively applies one-base substitutions, insertions, and deletions until convergence is achieved to an optimum diploid sequence. A local de novo assembly procedure that generalizes approaches based on De Bruijn graphs is used to seed the optimization process in order to reduce the chance of converging to local optima. Finally, a correlation-based filter is applied to reduce the false positive rate caused by the presence of repetitive regions in the reference genome.
Journal of computational biology: a journal of computational molecular cell biology 12/2011; 19(3):279-92. DOI:10.1089/cmb.2011.0201 · 1.74 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Genome-wide association studies have identified several genomic regions that are associated with breast cancer risk, but these provide an explanation for only a small fraction of familial breast cancer aggregation. Genotype by environment interactions may contribute further to such explanation, and may help to refine the genomic regions of interest.
We examined genotypes for 4,988 SNPs, selected from recent genome-wide studies, and four randomized hormonal and dietary interventions among 2,166 women who developed invasive breast cancer during the intervention phase of the Women's Health Initiative (WHI) clinical trial (1993 to 2005), and one-to-one matched controls. These SNPs derive from 3,224 genomic regions having pairwise squared correlation (r2) between adjacent regions less than 0.2. Breast cancer and SNP associations were identified using a test statistic that combined evidence of overall association with evidence for SNPs by intervention interaction.
The combined 'main effect' and interaction test led to a focus on two genomic regions, the fibroblast growth factor receptor two (FGFR2) and the mitochondrial ribosomal protein S30 (MRPS30) regions. The ranking of SNPs by significance level, based on this combined test, was rather different from that based on the main effect alone, and drew attention to the vicinities of rs3750817 in FGFR2 and rs7705343 in MRPS30. Specifically, rs7705343 was included with several FGFR2 SNPs in a group of SNPs having an estimated false discovery rate < 0.05. In further analyses, there were suggestions (nominal P < 0.05) that hormonal and dietary intervention hazard ratios varied with the number of minor alleles of rs7705343.
Genotype by environment interaction information may help to define genomic regions relevant to disease risk. Combined main effect and intervention interaction analyses raise novel hypotheses concerning the MRPS30 genomic region and the effects of hormonal and dietary exposures on postmenopausal breast cancer risk.
Genome Medicine 06/2011; 3(6):42. DOI:10.1186/gm258 · 5.34 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Common single-nucleotide polymorphisms (SNPs) at nicotinic acetylcholine receptor (nAChR) subunit genes have previously been associated with measures of nicotine dependence. We investigated the contribution of common SNPs and rare single-nucleotide variants (SNVs) in nAChR genes to Fagerström test for nicotine dependence (FTND) scores in treatment-seeking smokers. Exons of 10 genes were resequenced with next-generation sequencing technology in 448 European-American participants of a smoking cessation trial, and CHRNB2 and CHRNA4 were resequenced by Sanger technology to improve sequence coverage. A total of 214 SNP/SNVs were identified, of which 19.2% were excluded from analyses because of reduced completion rate, 73.9% had minor allele frequencies <5%, and 48.1% were novel relative to dbSNP build 129. We tested associations of 173 SNP/SNVs with the FTND score using data obtained from 430 individuals (18 were excluded because of reduced completion rate) using linear regression for common, the cohort allelic sum test and the weighted sum statistic for rare, and the multivariate distance matrix regression method for both common and rare SNP/SNVs. Association testing with common SNPs with adjustment for correlated tests within each gene identified a significant association with two CHRNB2 SNPs, eg, the minor allele of rs2072660 increased the mean FTND score by 0.6 Units (P=0.01). We observed a significant evidence for association with the FTND score of common and rare SNP/SNVs at CHRNA5 and CHRNB2, and of rare SNVs at CHRNA4. Both common and/or rare SNP/SNVs from multiple nAChR subunit genes are associated with the FTND score in this sample of treatment-seeking smokers.
Neuropsychopharmacology: official publication of the American College of Neuropsychopharmacology 11/2010; 35(12):2392-402. DOI:10.1038/npp.2010.120 · 7.05 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Lung cancer is the leading cause of cancer-related mortality worldwide, with non-small-cell lung carcinomas in smokers being the predominant form of the disease. Although previous studies have identified important common somatic mutations in lung cancers, they have primarily focused on a limited set of genes and have thus provided a constrained view of the mutational spectrum. Recent cancer sequencing efforts have used next-generation sequencing technologies to provide a genome-wide view of mutations in leukaemia, breast cancer and cancer cell lines. Here we present the complete sequences of a primary lung tumour (60x coverage) and adjacent normal tissue (46x). Comparing the two genomes, we identify a wide variety of somatic variations, including >50,000 high-confidence single nucleotide variants. We validated 530 somatic single nucleotide variants in this tumour, including one in the KRAS proto-oncogene and 391 others in coding regions, as well as 43 large-scale structural variations. These constitute a large set of new somatic mutations and yield an estimated 17.7 per megabase genome-wide somatic mutation rate. Notably, we observe a distinct pattern of selection against mutations within expressed genes compared to non-expressed genes and in promoter regions up to 5 kilobases upstream of all protein-coding genes. Furthermore, we observe a higher rate of amino acid-changing mutations in kinase genes. We present a comprehensive view of somatic alterations in a single lung tumour, and provide the first evidence, to our knowledge, of distinct selective pressures present within the tumour environment.
[Show abstract][Hide abstract] ABSTRACT: The Women's Health Initiative dietary modification (DM) trial provided suggestive evidence of a benefit of a low-fat dietary pattern on breast cancer risk, with stronger evidence among women whose baseline diet was high in fat. Single nucleotide polymorphisms (SNP) in the FGFR2 gene relate strongly to breast cancer risk and could influence intervention effects.
All 48,835 trial participants were postmenopausal and ages 50 to 79 years at enrollment (1993-1998). We interrogated eight SNPs in intron 2 of the FGFR2 gene for 1,676 women who developed breast cancer during trial follow-up (1993-2005). Case-only analyses were used to estimate odds ratios for the DM intervention in relation to SNP genotype.
Odds ratios for the DM intervention did not vary significantly with the genotype for any of the eight FGFR2 SNPs (P > or = 0.18). However, odds ratios varied (P < 0.05) with the genotype of six of these SNPs, among women having baseline percent of energy from fat in the upper quartile (> or =36.8%). This variation is most evident for SNP rs3750817, with odds ratios for the DM intervention at 0, 1, and 2 minor SNP alleles of 1.06 [95% confidence intervals (95% CI), 0.80-1.41], 0.53 (95% CI, 0.38-0.74), and 0.62 (95% CI, 0.33-1.15). The nominal significance level for this interaction is P = 0.005, and P = 0.03 following multiple testing adjustment, with most evidence deriving from hormone receptor-positive tumors.
Invasive breast cancer odds ratios for a low-fat dietary pattern, among women whose usual diets are high in fat, seem to vary with SNP rs3750817 in the FGFR2 gene.
[Show abstract][Hide abstract] ABSTRACT: Two recent genome-wide association studies have described associations of SNP variants in _PNPLA3_ with nonalcoholic fatty liver and plasma liver enzyme levels in population based cohorts. We investigated the contributions of these variants to clinical outcomes in Mestizo subjects with a history of excessive alcohol consumption. We show that non-synonymous variant rs738409[G] (I148M) in _PNPLA3_ is strongly associated with alcoholic liver disease and progression to alcoholic cirrhosis (unadjusted OR = 2.25, P = 1.7x10^-10^; ancestry-adjusted OR = 1.79, P = 1.9x10^-5^).
[Show abstract][Hide abstract] ABSTRACT: Breast cancer concern is a major reason for the recent marked reduction in use of postmenopausal hormone therapy, although equally effective means of controlling menopausal symptoms are lacking. Single nucleotide polymorphisms (SNP) in the fibroblast growth factor receptor 2 (FGFR2) gene are substantially associated with postmenopausal breast cancer risk and could influence hormone therapy effects.
We interrogated eight SNPs in intron 2 of the FGFR2 gene for 2,166 invasive breast cancer cases from the Women's Health Initiative clinical trial and one-to-one matched controls to confirm an association with breast cancer risk. We used case-only analyses to examine the dependence of estrogen plus progestin and estrogen-alone odds ratios on SNP genotype.
Seven FGFR2 SNPs, including six in a single linkage disequilibrium region, were found to associate strongly (P < 10(-7)) with breast cancer risk. SNP rs3750817 (minor allele T with frequency 0.39) had an estimated per-minor-allele odds ratio of 0.78, and was not in such strong linkage disequilibrium with the other SNPs. The genotype of this SNP related significantly (P < 0.05) to hormone therapy odds ratios. For estrogen plus progestin, the odds ratios (95% confidence intervals) at 0, 1, and 2 minor SNP alleles were 1.52 (1.14-2.02), 1.33 (1.01-1.75), and 0.69 (0.41-1.17), whereas the corresponding values for estrogen alone were 0.74 (0.51-1.09), 0.99 (0.68-1.44), and 0.34 (0.15-0.76).
Postmenopausal women having TT genotype for SNP rs3750817 have a reduced breast cancer risk and seem to experience comparatively favorable effects of postmenopausal hormone therapy.
[Show abstract][Hide abstract] ABSTRACT: Two genome-wide association studies (GWAS) have described associations of variants in PNPLA3 with nonalcoholic fatty liver and plasma liver enzyme levels. We investigated the contributions of these variants to liver disease in Mestizo subjects with a history of alcohol dependence. We found that rs738409 in PNPLA3 is strongly associated with alcoholic liver disease and clinically evident alcoholic cirrhosis (unadjusted OR= 2.25, P=1.7 x 10(-10); ancestry-adjusted OR=1.79, P=1.9 x 10(-5)).
[Show abstract][Hide abstract] ABSTRACT: Genome sequencing of large numbers of individuals promises to advance the understanding, treatment, and prevention of human
diseases, among other applications. We describe a genome sequencing platform that achieves efficient imaging and low reagent
consumption with combinatorial probe anchor ligation chemistry to independently assay each base from patterned nanoarrays
of self-assembling DNA nanoballs. We sequenced three human genomes with this platform, generating an average of 45- to 87-fold
coverage per genome and identifying 3.2 to 4.5 million sequence variants per genome. Validation of one genome data set demonstrates
a sequence accuracy of about 1 false variant per 100 kilobases. The high accuracy, affordable cost of $4400 for sequencing
consumables, and scalability of this platform enable complete human genome sequencing for the detection of rare variants in
large-scale genetic studies.
[Show abstract][Hide abstract] ABSTRACT: Rice, the primary source of dietary calories for half of humanity, is the first crop plant for which a high-quality reference genome sequence from a single variety was produced. We used resequencing microarrays to interrogate 100 Mb of the unique fraction of the reference genome for 20 diverse varieties and landraces that capture the impressive genotypic and phenotypic diversity of domesticated rice. Here, we report the distribution of 160,000 nonredundant SNPs. Introgression patterns of shared SNPs revealed the breeding history and relationships among the 20 varieties; some introgressed regions are associated with agronomic traits that mark major milestones in rice improvement. These comprehensive SNP data provide a foundation for deep exploration of rice diversity and gene-trait relationships and their use for future rice improvement.
Proceedings of the National Academy of Sciences 08/2009; 106(30):12273-8. DOI:10.1073/pnas.0900992106 · 9.67 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Markers for individual genotyping can be selected using quantitative genotyping of pooled DNA. This strategy saves time and money.
To determine the efficacy of this approach, we investigated the bivariate distribution of association test statistics from pooled and individual genotypes. We used a sample of approximately 1,000 samples with individual and pooled genotyping on 40,000 SNPs.
We found that the distribution of the joint test statistics can be modelled as a mixture of two bivariate normal distributions. One distribution has a correlation of zero, and is probably due to SNPs whose pooled genotyping was unsuccessful. The other distribution has a correlation of approximately 0.65 in our data. This latter distribution is probably accounted for by SNPs whose pooled genotyping accurately predicts the underlying allele frequency. Approximately 87% of the data belongs to this distribution. We also derived a method to investigate the effect of both the correlation and selection cut-off on the relative power of pooling studies. We demonstrate that pooled genotyping has good power to detect SNPs that are truly associated with disease-causing variants for SNPs showing good correlation between pooled and individual genotyping. Therefore, this approach is a cost effective tool for association studies.
Human Heredity 02/2009; 67(4):219-25. DOI:10.1159/000194975 · 1.47 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Circulating levels of sex hormone-binding globulin (SHBG) are inversely associated with breast cancer risk in postmenopausal women. Three polymorphisms within the SHBG gene have been reported to affect SHBG levels, but there has been no systematic attempt to identify other such variants.
We looked for associations between SHBG levels in 1,134 healthy, postmenopausal women and 11 tagging single nucleotide polymorphisms (SNP) in or around the SHBG gene. Associations between SHBG SNPs and breast cancer were tested in up to 6,622 postmenopausal breast cancer cases and 6,784 controls.
Ten SNPs within or close to the SHBG gene were significantly associated with SHBG levels as was the (TAAAA)(n) polymorphism. The best-fitting combination of rs6259, rs858521, and rs727428 and body mass index, waist, hip, age, and smoking status accounted for 24% of the variance in SHBG levels (natural logarithm transformed). Haplotype analysis suggested that rs858518, rs727428, or a variant in linkage disequilibrium with them acts to decrease SHBG levels but that this effect is neutralized by rs6259 (D356N). rs1799941 increases SHBG levels, but the previously reported association with (TAAAA)(n) repeat length appears to be a consequence of linkage disequilibrium with these SNPs. One further SHBG SNP was significantly associated with breast cancer (rs6257, per-allele odds ratio, 0.88; 95% confidence interval, 0.82-0.95; P = 0.002).
At least 3 SNPs showed associations with SHBG levels that were highly significant but relatively small in magnitude. rs6257 is a potential breast cancer susceptibility variant, but relationships between the genetic determinants of SHBG levels and breast cancer are complex.
[Show abstract][Hide abstract] ABSTRACT: The clinical overlap between monogenic Familial Hemiplegic Migraine (FHM) and common migraine subtypes, and the fact that all three FHM genes are involved in the transport of ions, suggest that ion transport genes may underlie susceptibility to common forms of migraine. To test this leading hypothesis, we examined common variation in 155 ion transport genes using 5257 single nucleotide polymorphisms (SNPs) in a Finnish sample of 841 unrelated migraine with aura cases and 884 unrelated non-migraine controls. The top signals were then tested for replication in four independent migraine case-control samples from the Netherlands, Germany and Australia, totalling 2835 unrelated migraine cases and 2740 unrelated controls. SNPs within 12 genes (KCNB2, KCNQ3, CLIC5, ATP2C2, CACNA1E, CACNB2, KCNE2, KCNK12, KCNK2, KCNS3, SCN5A and SCN9A) with promising nominal association (0.00041 < P < 0.005) in the Finnish sample were selected for replication. Although no variant remained significant after adjusting for multiple testing nor produced consistent evidence for association across all cohorts, a significant epistatic interaction between KCNB2 SNP rs1431656 (chromosome 8q13.3) and CACNB2 SNP rs7076100 (chromosome 10p12.33) (pointwise P = 0.00002; global P = 0.02) was observed in the Finnish case-control sample. We conclude that common variants of moderate effect size in ion transport genes do not play a major role in susceptibility to common migraine within these European populations, although there is some evidence for epistatic interaction between potassium and calcium channel genes, KCNB2 and CACNB2. Multiple rare variants or trans-regulatory elements of these genes are not ruled out.
Human Molecular Genetics 08/2008; 17(21):3318-31. DOI:10.1093/hmg/ddn227 · 6.39 Impact Factor