[show abstract][hide abstract] ABSTRACT: BACKGROUND: Genomewide association studies have identified several genomic regions that are associated with stroke risk, but these provide an explanation for only a small fraction of familial stroke aggregation. Genotype by environment interactions may contribute further to such explanation. The Women's Health Initiative (WHI) clinical trial found increased stroke risk with postmenopausal hormone therapy (HT) and provides an efficient setting for evaluating genotype-HT interaction on stroke risk. METHODS: We examined HT by genotype interactions for 392 single nucleotide polymorphisms (SNPs) selected from candidate gene studies, and 2371 SNPs associated with change in blood protein concentrations after hormone therapy, in analyses that included 2045 postmenopausal women who developed stroke during WHI clinical trial and observational study follow-up and 1-1 matched controls. A two-stage procedure was implemented where SNPs passing the first stage screening based on marginal association with stroke risk were tested in the second-stage for interaction with HT using case-only analysis. RESULTS: The two stage procedure identified two SNPs rs2154299 and rs12194855 in the Coagulation Factor XIII Subunit A (F13A1) region and two SNPs rs630431 and rs560892 in the Proprotein Convertase Subtilisin Kexin 9 (PCSK9) region with estimated false discovery rate < 0.05 based on interaction tests. Further analyses showed significant stroke risk interaction between these F13A1 SNPs and Estrogen plus Progestin (E+P) treatment for ischemic stroke and for ischemic and hemorrhagic stroke combined, and suggested interactions between PCSK9 SNPs with either E+P or Estrogen-alone treatment. CONCLUSIONS: Genotype by environment interaction information may help to define genomic regions relevant to stroke risk. Two-stage analysis among postmenopausal women generates novel hypotheses concerning the F13A1 and PCSK9 genomic regions and the effects of hormonal exposures on postmenopausal stroke risk for subsequent independent validation.
Genome Medicine 07/2012; 4(7):57. · 3.40 Impact Factor
[show abstract][hide abstract] ABSTRACT: Hepatitis B virus (HBV) infection is a leading risk factor for hepatocellular carcinoma (HCC). HBV integration into the host genome has been reported, but its scale, impact and contribution to HCC development is not clear. Here, we sequenced the tumor and nontumor genomes (>80× coverage) and transcriptomes of four HCC patients and identified 255 HBV integration sites. Increased sequencing to 240× coverage revealed a proportionally higher number of integration sites. Clonal expansion of HBV-integrated hepatocytes was found specifically in tumor samples. We observe a diverse collection of genomic perturbations near viral integration sites, including direct gene disruption, viral promoter-driven human transcription, viral-human transcript fusion, and DNA copy number alteration. Thus, we report the most comprehensive characterization of HBV integration in hepatocellular carcinoma patients. Such widespread random viral integration will likely increase carcinogenic opportunities in HBV-infected individuals.
Genome Research 02/2012; 22(4):593-601. · 14.40 Impact Factor
[show abstract][hide abstract] ABSTRACT: Unchained base reads on self-assembling DNA nanoarrays have recently emerged as a promising approach to low-cost, high-quality resequencing of human genomes. Because of unique characteristics of these mated pair reads, existing computational methods for resequencing assembly, such as those based on map-consensus calling, are not adequate for accurate variant calling. We describe novel computational methods developed for accurate calling of SNPs and short substitutions and indels (<100 bp); the same methods apply to evaluation of hypothesized larger, structural variations. We use an optimization process that iteratively adjusts the genome sequence to maximize its a posteriori probability given the observed reads. For each candidate sequence, this probability is computed using Bayesian statistics with a simple read generation model and simplifying assumptions that make the problem computationally tractable. The optimization process iteratively applies one-base substitutions, insertions, and deletions until convergence is achieved to an optimum diploid sequence. A local de novo assembly procedure that generalizes approaches based on De Bruijn graphs is used to seed the optimization process in order to reduce the chance of converging to local optima. Finally, a correlation-based filter is applied to reduce the false positive rate caused by the presence of repetitive regions in the reference genome.
Journal of computational biology: a journal of computational molecular cell biology 12/2011; 19(3):279-92. · 1.69 Impact Factor
[show abstract][hide abstract] ABSTRACT: Genome-wide association studies have identified several genomic regions that are associated with breast cancer risk, but these provide an explanation for only a small fraction of familial breast cancer aggregation. Genotype by environment interactions may contribute further to such explanation, and may help to refine the genomic regions of interest.
We examined genotypes for 4,988 SNPs, selected from recent genome-wide studies, and four randomized hormonal and dietary interventions among 2,166 women who developed invasive breast cancer during the intervention phase of the Women's Health Initiative (WHI) clinical trial (1993 to 2005), and one-to-one matched controls. These SNPs derive from 3,224 genomic regions having pairwise squared correlation (r2) between adjacent regions less than 0.2. Breast cancer and SNP associations were identified using a test statistic that combined evidence of overall association with evidence for SNPs by intervention interaction.
The combined 'main effect' and interaction test led to a focus on two genomic regions, the fibroblast growth factor receptor two (FGFR2) and the mitochondrial ribosomal protein S30 (MRPS30) regions. The ranking of SNPs by significance level, based on this combined test, was rather different from that based on the main effect alone, and drew attention to the vicinities of rs3750817 in FGFR2 and rs7705343 in MRPS30. Specifically, rs7705343 was included with several FGFR2 SNPs in a group of SNPs having an estimated false discovery rate < 0.05. In further analyses, there were suggestions (nominal P < 0.05) that hormonal and dietary intervention hazard ratios varied with the number of minor alleles of rs7705343.
Genotype by environment interaction information may help to define genomic regions relevant to disease risk. Combined main effect and intervention interaction analyses raise novel hypotheses concerning the MRPS30 genomic region and the effects of hormonal and dietary exposures on postmenopausal breast cancer risk.
Genome Medicine 06/2011; 3(6):42. · 3.40 Impact Factor
[show abstract][hide abstract] ABSTRACT: Common single-nucleotide polymorphisms (SNPs) at nicotinic acetylcholine receptor (nAChR) subunit genes have previously been associated with measures of nicotine dependence. We investigated the contribution of common SNPs and rare single-nucleotide variants (SNVs) in nAChR genes to Fagerström test for nicotine dependence (FTND) scores in treatment-seeking smokers. Exons of 10 genes were resequenced with next-generation sequencing technology in 448 European-American participants of a smoking cessation trial, and CHRNB2 and CHRNA4 were resequenced by Sanger technology to improve sequence coverage. A total of 214 SNP/SNVs were identified, of which 19.2% were excluded from analyses because of reduced completion rate, 73.9% had minor allele frequencies <5%, and 48.1% were novel relative to dbSNP build 129. We tested associations of 173 SNP/SNVs with the FTND score using data obtained from 430 individuals (18 were excluded because of reduced completion rate) using linear regression for common, the cohort allelic sum test and the weighted sum statistic for rare, and the multivariate distance matrix regression method for both common and rare SNP/SNVs. Association testing with common SNPs with adjustment for correlated tests within each gene identified a significant association with two CHRNB2 SNPs, eg, the minor allele of rs2072660 increased the mean FTND score by 0.6 Units (P=0.01). We observed a significant evidence for association with the FTND score of common and rare SNP/SNVs at CHRNA5 and CHRNB2, and of rare SNVs at CHRNA4. Both common and/or rare SNP/SNVs from multiple nAChR subunit genes are associated with the FTND score in this sample of treatment-seeking smokers.
Neuropsychopharmacology: official publication of the American College of Neuropsychopharmacology 11/2010; 35(12):2392-402. · 6.99 Impact Factor
[show abstract][hide abstract] ABSTRACT: Lung cancer is the leading cause of cancer-related mortality worldwide, with non-small-cell lung carcinomas in smokers being the predominant form of the disease. Although previous studies have identified important common somatic mutations in lung cancers, they have primarily focused on a limited set of genes and have thus provided a constrained view of the mutational spectrum. Recent cancer sequencing efforts have used next-generation sequencing technologies to provide a genome-wide view of mutations in leukaemia, breast cancer and cancer cell lines. Here we present the complete sequences of a primary lung tumour (60x coverage) and adjacent normal tissue (46x). Comparing the two genomes, we identify a wide variety of somatic variations, including >50,000 high-confidence single nucleotide variants. We validated 530 somatic single nucleotide variants in this tumour, including one in the KRAS proto-oncogene and 391 others in coding regions, as well as 43 large-scale structural variations. These constitute a large set of new somatic mutations and yield an estimated 17.7 per megabase genome-wide somatic mutation rate. Notably, we observe a distinct pattern of selection against mutations within expressed genes compared to non-expressed genes and in promoter regions up to 5 kilobases upstream of all protein-coding genes. Furthermore, we observe a higher rate of amino acid-changing mutations in kinase genes. We present a comprehensive view of somatic alterations in a single lung tumour, and provide the first evidence, to our knowledge, of distinct selective pressures present within the tumour environment.
[show abstract][hide abstract] ABSTRACT: The Women's Health Initiative dietary modification (DM) trial provided suggestive evidence of a benefit of a low-fat dietary pattern on breast cancer risk, with stronger evidence among women whose baseline diet was high in fat. Single nucleotide polymorphisms (SNP) in the FGFR2 gene relate strongly to breast cancer risk and could influence intervention effects.
All 48,835 trial participants were postmenopausal and ages 50 to 79 years at enrollment (1993-1998). We interrogated eight SNPs in intron 2 of the FGFR2 gene for 1,676 women who developed breast cancer during trial follow-up (1993-2005). Case-only analyses were used to estimate odds ratios for the DM intervention in relation to SNP genotype.
Odds ratios for the DM intervention did not vary significantly with the genotype for any of the eight FGFR2 SNPs (P > or = 0.18). However, odds ratios varied (P < 0.05) with the genotype of six of these SNPs, among women having baseline percent of energy from fat in the upper quartile (> or =36.8%). This variation is most evident for SNP rs3750817, with odds ratios for the DM intervention at 0, 1, and 2 minor SNP alleles of 1.06 [95% confidence intervals (95% CI), 0.80-1.41], 0.53 (95% CI, 0.38-0.74), and 0.62 (95% CI, 0.33-1.15). The nominal significance level for this interaction is P = 0.005, and P = 0.03 following multiple testing adjustment, with most evidence deriving from hormone receptor-positive tumors.
Invasive breast cancer odds ratios for a low-fat dietary pattern, among women whose usual diets are high in fat, seem to vary with SNP rs3750817 in the FGFR2 gene.
[show abstract][hide abstract] ABSTRACT: Two recent genome-wide association studies have described associations of SNP variants in _PNPLA3_ with nonalcoholic fatty liver and plasma liver enzyme levels in population based cohorts. We investigated the contributions of these variants to clinical outcomes in Mestizo subjects with a history of excessive alcohol consumption. We show that non-synonymous variant rs738409[G] (I148M) in _PNPLA3_ is strongly associated with alcoholic liver disease and progression to alcoholic cirrhosis (unadjusted OR = 2.25, P = 1.7x10^-10^; ancestry-adjusted OR = 1.79, P = 1.9x10^-5^).
[show abstract][hide abstract] ABSTRACT: Two genome-wide association studies (GWAS) have described associations of variants in PNPLA3 with nonalcoholic fatty liver and plasma liver enzyme levels. We investigated the contributions of these variants to liver disease in Mestizo subjects with a history of alcohol dependence. We found that rs738409 in PNPLA3 is strongly associated with alcoholic liver disease and clinically evident alcoholic cirrhosis (unadjusted OR= 2.25, P=1.7 x 10(-10); ancestry-adjusted OR=1.79, P=1.9 x 10(-5)).
[show abstract][hide abstract] ABSTRACT: Genome sequencing of large numbers of individuals promises to advance the understanding, treatment, and prevention of human diseases, among other applications. We describe a genome sequencing platform that achieves efficient imaging and low reagent consumption with combinatorial probe anchor ligation chemistry to independently assay each base from patterned nanoarrays of self-assembling DNA nanoballs. We sequenced three human genomes with this platform, generating an average of 45- to 87-fold coverage per genome and identifying 3.2 to 4.5 million sequence variants per genome. Validation of one genome data set demonstrates a sequence accuracy of about 1 false variant per 100 kilobases. The high accuracy, affordable cost of $4400 for sequencing consumables, and scalability of this platform enable complete human genome sequencing for the detection of rare variants in large-scale genetic studies.
[show abstract][hide abstract] ABSTRACT: Breast cancer concern is a major reason for the recent marked reduction in use of postmenopausal hormone therapy, although equally effective means of controlling menopausal symptoms are lacking. Single nucleotide polymorphisms (SNP) in the fibroblast growth factor receptor 2 (FGFR2) gene are substantially associated with postmenopausal breast cancer risk and could influence hormone therapy effects.
We interrogated eight SNPs in intron 2 of the FGFR2 gene for 2,166 invasive breast cancer cases from the Women's Health Initiative clinical trial and one-to-one matched controls to confirm an association with breast cancer risk. We used case-only analyses to examine the dependence of estrogen plus progestin and estrogen-alone odds ratios on SNP genotype.
Seven FGFR2 SNPs, including six in a single linkage disequilibrium region, were found to associate strongly (P < 10(-7)) with breast cancer risk. SNP rs3750817 (minor allele T with frequency 0.39) had an estimated per-minor-allele odds ratio of 0.78, and was not in such strong linkage disequilibrium with the other SNPs. The genotype of this SNP related significantly (P < 0.05) to hormone therapy odds ratios. For estrogen plus progestin, the odds ratios (95% confidence intervals) at 0, 1, and 2 minor SNP alleles were 1.52 (1.14-2.02), 1.33 (1.01-1.75), and 0.69 (0.41-1.17), whereas the corresponding values for estrogen alone were 0.74 (0.51-1.09), 0.99 (0.68-1.44), and 0.34 (0.15-0.76).
Postmenopausal women having TT genotype for SNP rs3750817 have a reduced breast cancer risk and seem to experience comparatively favorable effects of postmenopausal hormone therapy.
[show abstract][hide abstract] ABSTRACT: Rice, the primary source of dietary calories for half of humanity, is the first crop plant for which a high-quality reference genome sequence from a single variety was produced. We used resequencing microarrays to interrogate 100 Mb of the unique fraction of the reference genome for 20 diverse varieties and landraces that capture the impressive genotypic and phenotypic diversity of domesticated rice. Here, we report the distribution of 160,000 nonredundant SNPs. Introgression patterns of shared SNPs revealed the breeding history and relationships among the 20 varieties; some introgressed regions are associated with agronomic traits that mark major milestones in rice improvement. These comprehensive SNP data provide a foundation for deep exploration of rice diversity and gene-trait relationships and their use for future rice improvement.
Proceedings of the National Academy of Sciences 08/2009; 106(30):12273-8. · 9.74 Impact Factor
[show abstract][hide abstract] ABSTRACT: Markers for individual genotyping can be selected using quantitative genotyping of pooled DNA. This strategy saves time and money.
To determine the efficacy of this approach, we investigated the bivariate distribution of association test statistics from pooled and individual genotypes. We used a sample of approximately 1,000 samples with individual and pooled genotyping on 40,000 SNPs.
We found that the distribution of the joint test statistics can be modelled as a mixture of two bivariate normal distributions. One distribution has a correlation of zero, and is probably due to SNPs whose pooled genotyping was unsuccessful. The other distribution has a correlation of approximately 0.65 in our data. This latter distribution is probably accounted for by SNPs whose pooled genotyping accurately predicts the underlying allele frequency. Approximately 87% of the data belongs to this distribution. We also derived a method to investigate the effect of both the correlation and selection cut-off on the relative power of pooling studies. We demonstrate that pooled genotyping has good power to detect SNPs that are truly associated with disease-causing variants for SNPs showing good correlation between pooled and individual genotyping. Therefore, this approach is a cost effective tool for association studies.
Human Heredity 02/2009; 67(4):219-25. · 1.57 Impact Factor
[show abstract][hide abstract] ABSTRACT: Circulating levels of sex hormone-binding globulin (SHBG) are inversely associated with breast cancer risk in postmenopausal women. Three polymorphisms within the SHBG gene have been reported to affect SHBG levels, but there has been no systematic attempt to identify other such variants.
We looked for associations between SHBG levels in 1,134 healthy, postmenopausal women and 11 tagging single nucleotide polymorphisms (SNP) in or around the SHBG gene. Associations between SHBG SNPs and breast cancer were tested in up to 6,622 postmenopausal breast cancer cases and 6,784 controls.
Ten SNPs within or close to the SHBG gene were significantly associated with SHBG levels as was the (TAAAA)(n) polymorphism. The best-fitting combination of rs6259, rs858521, and rs727428 and body mass index, waist, hip, age, and smoking status accounted for 24% of the variance in SHBG levels (natural logarithm transformed). Haplotype analysis suggested that rs858518, rs727428, or a variant in linkage disequilibrium with them acts to decrease SHBG levels but that this effect is neutralized by rs6259 (D356N). rs1799941 increases SHBG levels, but the previously reported association with (TAAAA)(n) repeat length appears to be a consequence of linkage disequilibrium with these SNPs. One further SHBG SNP was significantly associated with breast cancer (rs6257, per-allele odds ratio, 0.88; 95% confidence interval, 0.82-0.95; P = 0.002).
At least 3 SNPs showed associations with SHBG levels that were highly significant but relatively small in magnitude. rs6257 is a potential breast cancer susceptibility variant, but relationships between the genetic determinants of SHBG levels and breast cancer are complex.
[show abstract][hide abstract] ABSTRACT: The clinical overlap between monogenic Familial Hemiplegic Migraine (FHM) and common migraine subtypes, and the fact that all three FHM genes are involved in the transport of ions, suggest that ion transport genes may underlie susceptibility to common forms of migraine. To test this leading hypothesis, we examined common variation in 155 ion transport genes using 5257 single nucleotide polymorphisms (SNPs) in a Finnish sample of 841 unrelated migraine with aura cases and 884 unrelated non-migraine controls. The top signals were then tested for replication in four independent migraine case-control samples from the Netherlands, Germany and Australia, totalling 2835 unrelated migraine cases and 2740 unrelated controls. SNPs within 12 genes (KCNB2, KCNQ3, CLIC5, ATP2C2, CACNA1E, CACNB2, KCNE2, KCNK12, KCNK2, KCNS3, SCN5A and SCN9A) with promising nominal association (0.00041 < P < 0.005) in the Finnish sample were selected for replication. Although no variant remained significant after adjusting for multiple testing nor produced consistent evidence for association across all cohorts, a significant epistatic interaction between KCNB2 SNP rs1431656 (chromosome 8q13.3) and CACNB2 SNP rs7076100 (chromosome 10p12.33) (pointwise P = 0.00002; global P = 0.02) was observed in the Finnish case-control sample. We conclude that common variants of moderate effect size in ion transport genes do not play a major role in susceptibility to common migraine within these European populations, although there is some evidence for epistatic interaction between potassium and calcium channel genes, KCNB2 and CACNB2. Multiple rare variants or trans-regulatory elements of these genes are not ruled out.
Human Molecular Genetics 08/2008; 17(21):3318-31. · 7.69 Impact Factor
[show abstract][hide abstract] ABSTRACT: In a genome-wide association study to identify loci associated with colorectal cancer (CRC) risk, we genotyped 555,510 SNPs in 1,012 early-onset Scottish CRC cases and 1,012 controls (phase 1). In phase 2, we genotyped the 15,008 highest-ranked SNPs in 2,057 Scottish cases and 2,111 controls. We then genotyped the five highest-ranked SNPs from the joint phase 1 and 2 analysis in 14,500 cases and 13,294 controls from seven populations, and identified a previously unreported association, rs3802842 on 11q23 (OR = 1.1; P = 5.8 x 10(-10)), showing population differences in risk. We also replicated and fine-mapped associations at 8q24 (rs7014346; OR = 1.19; P = 8.6 x 10(-26)) and 18q21 (rs4939827; OR = 1.2; P = 7.8 x 10(-28)). Risk was greater for rectal than for colon cancer for rs3802842 (P < 0.008) and rs4939827 (P < 0.009). Carrying all six possible risk alleles yielded OR = 2.6 (95% CI = 1.75-3.89) for CRC. These findings extend our understanding of the role of common genetic variation in CRC etiology.
[show abstract][hide abstract] ABSTRACT: Systemic lupus erythematosus (SLE) is a clinically heterogeneous disease in which the risk of disease is influenced by complex genetic and environmental contributions. Alleles of HLA-DRB1, IRF5, and STAT4 are established susceptibility genes; there is strong evidence for the existence of additional risk loci.
We genotyped more than 500,000 single-nucleotide polymorphisms (SNPs) in DNA samples from 1311 case subjects with SLE and 1783 control subjects; all subjects were North Americans of European descent. Genotypes from 1557 additional control subjects were obtained from public data repositories. We measured the association between the SNPs and SLE after applying strict quality-control filters to reduce technical artifacts and to correct for the presence of population stratification. Replication of the top loci was performed in 793 case subjects and 857 control subjects from Sweden.
Genetic variation in the region upstream from the transcription initiation site of the gene encoding B lymphoid tyrosine kinase (BLK) and C8orf13 (chromosome 8p23.1) was associated with disease risk in both the U.S. and Swedish case-control series (rs13277113; odds ratio, 1.39; P=1x10(-10)) and also with altered levels of messenger RNA in B-cell lines. In addition, variants on chromosome 16p11.22, near the genes encoding integrin alpha M (ITGAM, or CD11b) and integrin alpha X (ITGAX), were associated with SLE in the combined sample (rs11574637; odds ratio, 1.33; P=3x10(-11)).
We identified and then confirmed through replication two new genetic loci for SLE: a promoter-region allele associated with reduced expression of BLK and increased expression of C8orf13 and variants in the ITGAM-ITGAX region.
New England Journal of Medicine 03/2008; 358(9):900-9. · 51.66 Impact Factor
[show abstract][hide abstract] ABSTRACT: Human cancer is caused by the accumulation of mutations in oncogenes and tumor suppressor genes. To catalog the genetic changes that occur during tumorigenesis, we isolated DNA from 11 breast and 11 colorectal tumors and determined the sequences of the genes in the Reference Sequence database in these samples. Based on analysis of exons representing 20,857 transcripts from 18,191 genes, we conclude that the genomic landscapes of breast and colorectal cancers are composed of a handful of commonly mutated gene "mountains" and a much larger number of gene "hills" that are mutated at low frequency. We describe statistical and bioinformatic tools that may help identify mutations with a role in tumorigenesis. These results have implications for understanding the nature and heterogeneity of human cancers and for using personal genomics for tumor diagnosis and therapy.