[Show abstract][Hide abstract] ABSTRACT: Background:
Racial/ethnic differences for commonly measured clinical variables are well documented, and it has been postulated that population-specific genetic factors may play a role. The genetic heterogeneity of admixed populations, such as African Americans, provides a unique opportunity to identify genomic regions and variants associated with the clinical variability observed for diseases and traits across populations.
To begin a systematic search for these population-specific genomic regions at the phenome-wide scale, we determined the relationship between global genetic ancestry, specifically European and African ancestry, and clinical variables measured in a population of African Americans from BioVU, Vanderbilt University's biorepository linked to de-identified electronic medical records (EMRs) as part of the Epidemiologic Architecture using Genomics and Epidemiology (EAGLE) study. Through billing (ICD-9) codes, procedure codes, labs, and clinical notes, 36 common clinical and laboratory variables were mined from the EMR, including body mass index (BMI), kidney traits, lipid levels, blood pressure, and electrocardiographic measurements. A total of 15,863 DNA samples from non-European Americans were genotyped on the Illumina Metabochip containing ~200,000 variants, of which 11,166 were from African Americans. Tests of association were performed to examine associations between global ancestry and the phenotype of interest.
Increased European ancestry, and conversely decreased African ancestry, was most strongly correlated with an increase in QRS duration, consistent with previous observations that African Americans tend to have shorter a QRS duration compared with European Americans. Despite known racial/ethnic disparities in blood pressure, European and African ancestry was neither associated with diastolic nor systolic blood pressure measurements.
Collectively, these results suggest that this clinical population can be used to identify traits in which population differences may be due, in part, to population-specific genetics.
[Show abstract][Hide abstract] ABSTRACT: Epidemiologic collections have been a major resource for genotype–phenotype studies of complex disease given their large sample size, racial/ethnic diversity, and breadth and depth of phenotypes, traits, and exposures. A major disadvantage of these collections is they often survey households and communities without collecting extensive pedigree data. Failure to account for substantial relatedness can lead to inflated estimates and spurious associations. To examine the extent of cryptic relatedness in an epidemiologic collection, we as the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study accessed the National Health and Nutrition Examination Surveys (NHANES) linked to DNA samples (“Genetic NHANES”) from NHANES III and NHANES 1999–2002. NHANES are population-based cross-sectional surveys conducted by the National Center for Health Statistics at the Centers for Disease Control and Prevention. Genome-wide genetic data is not yet available in NHANES, and current data use agreements prohibit the generation of GWAS-level data in NHANES samples due issues in maintaining confidentiality among other ethical concerns. To date, only hundreds of single nucleotide polymorphisms (SNPs) genotyped in a variety of candidate genes are available for analysis in NHANES. We performed identity-by-descent (IBD) estimates in three self-identified subpopulations of Genetic NHANES (non-Hispanic white, non- Hispanic black, and Mexican American) using PLINK software to identify potential familial relationships from presumed unrelated subjects. We then compared the PLINKidentified relationships to those identified by an alternative method implemented in Kinship-based INference for Genome-wide association studies (KING). Overall, both methods identified familial relationships in NHANES III and NHANES 1999–2002 for all three subpopulations, but little concordance was observed between the two methods due in major part to the limited SNP data available in Genetic NHANES. Despite the lack of genome-wide data, our results suggest the presence of cryptic relatedness in this epidemiologic collection and highlight the limitations of restricted datasets such as NHANES in the context of modern day genetic epidemiology studies.
Frontiers in Genetics 10/2015; 6. DOI:10.3389/fgene.2015.00317
[Show abstract][Hide abstract] ABSTRACT: The most common side effect of angiotensin-converting enzyme inhibitor (ACEi) drugs is cough. We conducted a genome-wide association study (GWAS) of ACEi-induced cough among 7080 subjects of diverse ancestries in the Electronic Medical Records and Genomics (eMERGE) network. Cases were subjects diagnosed with ACEi-induced cough. Controls were subjects with at least 6 months of ACEi use and no cough. A GWAS (1595 cases and 5485 controls) identified associations on chromosome 4 in an intron of KCNIP4. The strongest association was at rs145489027 (minor allele frequency=0.33, odds ratio (OR)=1.3 (95% confidence interval (CI): 1.2-1.4), P=1.0 × 10(-8)). Replication for six single-nucleotide polymorphisms (SNPs) in KCNIP4 was tested in a second eMERGE population (n=926) and in the Genetics of Diabetes Audit and Research in Tayside, Scotland (GoDARTS) cohort (n=4309). Replication was observed at rs7675300 (OR=1.32 (1.01-1.70), P=0.04) in eMERGE and at rs16870989 and rs1495509 (OR=1.15 (1.01-1.30), P=0.03 for both) in GoDARTS. The combined association at rs1495509 was significant (OR=1.23 (1.15-1.32), P=1.9 × 10(-9)). These results indicate that SNPs in KCNIP4 may modulate ACEi-induced cough risk.The Pharmacogenomics Journal advance online publication, 14 July 2015; doi:10.1038/tpj.2015.51.
The Pharmacogenomics Journal 07/2015; DOI:10.1038/tpj.2015.51 · 4.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Electronic medical records (EMRs) are being widely implemented for use in genetic and genomic studies. As a phenotypic rich resource, EMRs provide researchers with the opportunity to identify disease cohorts and perform genotype-phenotype association studies. The Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study, as part of the Population Architecture using Genomics and Epidemiology (PAGE) I study, has genotyped more than 15,000 individuals of diverse genetic ancestry in BioVU, the Vanderbilt University Medical Center's biorepository linked to a de-identified version of the EMR (EAGLE BioVU). Here we develop and deploy an algorithm utilizing data mining techniques to identify primary open-angle glaucoma (POAG) in African Americans from EAGLE BioVU for genetic association studies. The algorithm described here was designed using a combination of diagnostic codes, current procedural terminology billing codes, and free text searches to identify POAG status in situations where gold-standard digital photography cannot be accessed. The case algorithm identified 267 potential POAG subjects but underperformed after manual review with a positive predictive value of 51.6% and an accuracy of 76.3%. The control algorithm identified controls with a negative predictive value of 98.3%. Although the case algorithm requires more downstream manual review for use in large-scale studies, it provides a basis by which to extract a specific clinical subtype of glaucoma from EMRs in the absence of digital photographs.
PLoS ONE 06/2015; 10(6):e0127817. DOI:10.1371/journal.pone.0127817 · 3.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Vancomycin, a commonly used antibiotic, can be nephrotoxic. Known risk factors such as age, creatinine clearance, vancomycin dose / dosing interval, and concurrent nephrotoxic medications fail to accurately predict nephrotoxicity. To identify potential genomic risk factors, we performed a genome-wide association study (GWAS) of serum creatinine levels while on vancomycin in 489 European American individuals and validated findings in three independent cohorts totaling 439 European American individuals. In primary analyses, the chromosome 6q22.31 locus was associated with increased serum creatinine levels while on vancomycin therapy (most significant variant rs2789047, risk allele A, β = -0.06, p = 1.1 x 10-7). SNPs in this region had consistent directions of effect in the validation cohorts, with a meta-p of 1.1 x 10-7. Variation in this region on chromosome 6, which includes the genes TBC1D32/C6orf170 and GJA1 (encoding connexin43), may modulate risk of vancomycin-induced kidney injury.
PLoS ONE 06/2015; 10(6):e0127791. DOI:10.1371/journal.pone.0127791 · 3.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Background
Biorepositories linked to de-identified electronic medical records (EMRs) have the potential to complement traditional epidemiologic studies in genotype-phenotype studies of complex human diseases and traits. A major challenge in meeting this potential is the use of EMR-derived data to extract phenotypes and covariates for genetic association studies. Unlike traditional epidemiologic data, EMR-derived data are collected for clinical care and are therefore highly variable across patients. The variability of clinical data coupled with the challenges associated with searching unstructured clinical notes requires the development of algorithms to extract phenotypes for analysis. Given the number of possible algorithms that could be developed for any one EMR-derived phenotype, we explored here the impact algorithm decision logic has on genetic association study results for a single quantitative trait, high density lipoprotein cholesterol (HDL-C).
We used five different algorithms to extract HDL-C from African American subjects genotyped on the Illumina Metabochip (n = 11,519) as part of Epidemiologic Architecture for Genes Linked to Environment (EAGLE). Tests of association between HDL-C and genetic risk scores for HDL-C associated variants suggest that the genetic effect size does not vary substantially across the five HDL-C definitions.
These data collectively suggest that, at least for this quantitative trait, algorithm decision logic and phenotyping details do not appreciably impact genetic association study test statistics.
[Show abstract][Hide abstract] ABSTRACT: Background To investigate potential cardiovascular and other eff ects of long-term pharmacological interleukin 1 (IL-1) inhibition, we studied genetic variants that produce inhibition of IL-1, a master regulator of infl ammation.
[Show abstract][Hide abstract] ABSTRACT: Several regions of the genome show pleiotropic associations with multiple cancers. We sought to evaluate whether 181 single-nucleotide polymorphisms previously associated with various cancers in genome-wide association studies were also associated with melanoma risk.
We evaluated 2,131 melanoma cases and 20,353 controls from three studies in the Population Architecture using Genomics and Epidemiology (PAGE) study (EAGLE-BioVU, MEC, WHI) and two collaborating studies (HPFS, NHS). Overall and sex-stratified analyses were performed across studies.
We observed statistically significant associations with melanoma for two lung cancer SNPs in the TERT-CLPTM1L locus (Bonferroni-corrected p<2.8x10-4), replicating known pleiotropic effects at this locus. In sex-stratified analyses, we also observed a potential male-specific association between prostate cancer risk variant rs12418451 and melanoma risk (OR=1.22, p=8.0x10-4). No other variants in our study were associated with melanoma after multiple comparisons adjustment (p>2.8e-4).
We provide confirmatory evidence of pleiotropic associations with melanoma for two SNPs previously associated with lung cancer, and provide suggestive evidence for a male-specific association with melanoma for prostate cancer variant rs12418451. This SNP is located near TPCN2, an ion transport gene containing SNPs which have been previously associated with hair pigmentation but not melanoma risk. Previous evidence provides biological plausibility for this association, and suggests a complex interplay between ion transport, pigmentation, and melanoma risk that may vary by sex. If confirmed, these pleiotropic relationships may help elucidate shared molecular pathways between cancers and related phenotypes.
PLoS ONE 03/2015; 10(3):e0120491. DOI:10.1371/journal.pone.0120491 · 3.23 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Body fat distribution is a heritable trait and a well-established predictor of adverse metabolic outcomes, independent of overall adiposity. To increase our understanding of the genetic basis of body fat distribution and its molecular links to cardiometabolic traits, here we conduct genome-wide association meta-analyses of traits related to waist and hip circumferences in up to 224,459 individuals. We identify 49 loci (33 new) associated with waist-to-hip ratio adjusted for body mass index (BMI), and an additional 19 loci newly associated with related waist and hip circumference measures (P < 5 x 10(-8)). In total, 20 of the 49 waist-to-hip ratio adjusted for BMI loci show significant sexual dimorphism, 19 of which display a stronger effect in women. The identified loci were enriched for genes expressed in adipose tissue and for putative regulatory elements in adipocytes. Pathway analyses implicated adipogenesis, angiogenesis, transcriptional regulation and insulin resistance as processes affecting fat distribution, providing insight into potential pathophysiological mechanisms.
[Show abstract][Hide abstract] ABSTRACT: Obesity is heritable and predisposes to many diseases. To understand the genetic basis of obesity better, here we conduct a genome-wide association study and Metabochip meta-analysis of body mass index (BMI), a measure commonly used to define obesity and assess adiposity, in up to 339,224 individuals. This analysis identifies 97 BMI-associated loci (P < 5 × 10(-8)), 56 of which are novel. Five loci demonstrate clear evidence of several independent association signals, and many loci have significant effects on other metabolic phenotypes. The 97 loci account for ∼2.7% of BMI variation, and genome-wide estimates suggest that common variation accounts for >20% of BMI variation. Pathway analyses provide strong support for a role of the central nervous system in obesity susceptibility and implicate new genes and pathways, including those related to synaptic function, glutamate signalling, insulin secretion/action, energy metabolism, lipid biology and adipogenesis
[Show abstract][Hide abstract] ABSTRACT: Elevated levels of plasma fibrinogen are associated with clot formation in the absence of inflammation or injury and is a biomarker for arterial clotting, the leading cause of cardiovascular disease. Fibrinogen levels are heritable with >50% attributed to genetic factors, however little is known about possible genetic modifiers that might explain the missing heritability. The fibrinogen gene cluster is comprised of three genes (FGA, FGB, and FGG) that make up the fibrinogen polypeptide essential for fibrinogen production in the blood. Given the known interaction with these genes, we tested 25 variants in the fibrinogen gene cluster for gene x gene and gene x environment interactions in 620 non-Hispanic blacks, 1,385 non-Hispanic whites, and 664 Mexican Americans from a cross-sectional dataset enriched with environmental data, the Third National Health and Nutrition Examination Survey (NHANES III). Using a multiplicative approach, we added cross product terms (gene x gene or gene x environment) to a linear regression model and declared significance at p < 0.05. We identified 19 unique gene x gene and 13 unique gene x environment interactions that impact fibrinogen levels in at least one population at p <0.05. Over 90% of the gene x gene interactions identified include a variant in the rate-limiting gene, FGB that is essential for the formation of the fibrinogen polypeptide. We also detected gene x environment interactions with fibrinogen variants and sex, smoking, and body mass index. These findings highlight the potential for the discovery of genetic modifiers for complex phenotypes in multiple populations and give a better understanding of the interaction between genes and/or the environment for fibrinogen levels. The need for more powerful and robust methods to identify genetic modifiers is still warranted.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 01/2015; 20:219-30.
[Show abstract][Hide abstract] ABSTRACT: Substantial progress has been made in identifying susceptibility variants for age-related macular degeneration (AMD). The majority of research to identify genetic variants associated with AMD has focused on nuclear genetic variation. While there is some evidence that mitochondrial genetic variation contributes to AMD susceptibility, to date, these studies have been limited to populations of European descent resulting in a lack of data in diverse populations. A major goal of the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study is to describe the underlying genetic architecture of common, complex diseases across diverse populations. This present study sought to determine if mitochondrial genetic variation influences risk of AMD across diverse populations. We performed a genetic association study to investigate the contribution of mitochondrial DNA variation to AMD risk. We accessed samples from the National Health and Nutrition Examination Surveys, a U.S population-based, cross-sectional survey collected without regard to health status. AMD cases and controls were selected from the Third NHANES and NHANES 2007-2008 datasets which include non-Hispanic whites, non-Hispanic blacks, and Mexican Americans. AMD cases were defined as those > 60 years of age with early/late AMD, as determined by fundus photography. Targeted genotyping was performed for 63 mitochondrial SNPs and participants were then classified into mitochondrial haplogroups. We used logistic regression assuming a dominant genetic model adjusting for age, sex, body mass index, and smoking status (ever vs. never). Regressions and meta-analyses were performed for individual SNPs and mitochondrial haplogroups J, T, and U. We identified five SNPs associated with AMD in Mexican Americans at p < 0.05, including three located in the control region (mt16111, mt16362, and mt16319), one in MT-RNR2 (mt1736), and one in MT-ND4 (mt12007). No mitochondrial variant or haplogroup was significantly associated in non-Hispanic blacks or non- Hispanic whites in the final meta-analysis. This study provides further evidence that mitochondrial variation plays a role in susceptibility to AMD and contributes to the knowledge of the genetic architecture of AMD in Mexican Americans.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 01/2015; 20:243-54. DOI:10.1142/9789814644730_0024
[Show abstract][Hide abstract] ABSTRACT: Studies assessing the impact of gene-environment interactions on common human diseases and traits have been relatively few for many reasons. One often acknowledged reason is that it is difficult to accurately measure the environment or exposure. Indeed, most large-scale epidemiologic studies use questionnaires to assess and measure past and current exposure levels. While questionnaires may be cost-effective, the data may or may not accurately represent the exposure compared with more direct measurements (e.g., self-reported current smoking status versus direct measurement for cotinine levels). Much like phenotyping, the choice in how an exposure is measured may impact downstream tests of genetic association and gene-environment interaction studies. As a case study, we performed tests of association between five common VKORC1 SNPs and two different measurements of vitamin K levels, dietary (n=5,725) and serum (n=348), in the Third National Health and Nutrition Examination Studies (NHANES III). We did not replicate previously reported associations between VKORC1 and vitamin K levels using either measure. Furthermore, the suggestive associations and estimated genetic effect sizes identified in this study differed depending on the vitamin K measurement. This case study of VKORC1 and vitamin K levels serves as a cautionary example of the downstream consequences that the type of exposure measurement choices will have on genetic association and possibly gene-environment studies.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing 01/2015; 20:161-70.
[Show abstract][Hide abstract] ABSTRACT: The NAv1.5 sodium channel α subunit is the predominant α-subunit expressed in the heart and is associated with cardiac arrhythmias. We tested five previously identified SCN5A variants (rs7374138, rs7637849, rs7637849, rs7629265, and rs11129796) for an association with PR interval and QRS duration in two unique study populations: the Third National Health and Nutrition Examination Survey (NHANES III, n= 552) accessed by the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) and a combined dataset (n= 455) from two biobanks linked to electronic medical records from Vanderbilt University (BioVU) and Northwestern University (NUgene) as part of the electronic Medical Records & Genomics (eMERGE) network. A meta-analysis including all three study populations (n~4,000) suggests that eight SCN5A associations were significant for both QRS duration and PR interval (p<5.0E-3) with little evidence for heterogeneity across the study populations. These results suggest that published SCN5A associations replicate across different study designs in a meta-analysis and represent an important first step in utility of multiple study designs for genetic studies and the identification/characterization of genetic variants associated with ECG traits in African-descent populations.
[Show abstract][Hide abstract] ABSTRACT: We performed a Phenome-wide association study (PheWAS) utilizing diverse genotypic and phenotypic data existing across multiple populations in the National Health and Nutrition Examination Surveys (NHANES), conducted by the Centers for Disease Control and Prevention (CDC), and accessed by the Epidemiological Architecture for Genes Linked to Environment (EAGLE) study. We calculated comprehensive tests of association in Genetic NHANES using 80 SNPs and 1,008 phenotypes (grouped into 184 phenotype classes), stratified by race-ethnicity. Genetic NHANES includes three surveys (NHANES III, 1999-2000, and 2001-2002) and three race-ethnicities: non-Hispanic whites (n = 6,634), non-Hispanic blacks (n = 3,458), and Mexican Americans (n = 3,950). We identified 69 PheWAS associations replicating across surveys for the same SNP, phenotype-class, direction of effect, and race-ethnicity at p<0.01, allele frequency >0.01, and sample size >200. Of these 69 PheWAS associations, 39 replicated previously reported SNP-phenotype associations, 9 were related to previously reported associations, and 21 were novel associations. Fourteen results had the same direction of effect across more than one race-ethnicity: one result was novel, 11 replicated previously reported associations, and two were related to previously reported results. Thirteen SNPs showed evidence of pleiotropy. We further explored results with gene-based biological networks, contrasting the direction of effect for pleiotropic associations across phenotypes. One PheWAS result was ABCG2 missense SNP rs2231142, associated with uric acid levels in both non-Hispanic whites and Mexican Americans, protoporphyrin levels in non-Hispanic whites and Mexican Americans, and blood pressure levels in Mexican Americans. Another example was SNP rs1800588 near LIPC, significantly associated with the novel phenotypes of folate levels (Mexican Americans), vitamin E levels (non-Hispanic whites) and triglyceride levels (non-Hispanic whites), and replication for cholesterol levels. The results of this PheWAS show the utility of this approach for exposing more of the complex genetic architecture underlying multiple traits, through generating novel hypotheses for future research.
[Show abstract][Hide abstract] ABSTRACT: Thyroid stimulating hormone (TSH) hormone levels are normally tightly regulated within an individual; thus, relatively small variations may indicate thyroid disease. Genome-wide association studies (GWAS) have identified variants in PDE8B and FOXE1 that are associated with TSH levels. However, prior studies lacked racial/ethnic diversity, limiting the generalization of these findings to individuals of non-European ethnicities. The Electronic Medical Records and Genomics (eMERGE) Network is a collaboration across institutions with biobanks linked to electronic medical records (EMRs). The eMERGE Network uses EMR-derived phenotypes to perform GWAS in diverse populations for a variety of phenotypes. In this report, we identified serum TSH levels from 4,501 European American and 351 African American euthyroid individuals in the eMERGE Network with existing GWAS data. Tests of association were performed using linear regression and adjusted for age, sex, body mass index (BMI), and principal components, assuming an additive genetic model. Our results replicate the known association of PDE8B with serum TSH levels in European Americans (rs2046045 p = 1.85×10-17, β = 0.09). FOXE1 variants, associated with hypothyroidism, were not genome-wide significant (rs10759944: p = 1.08×10-6, β = -0.05). No SNPs reached genome-wide significance in African Americans. However, multiple known associations with TSH levels in European ancestry were nominally significant in African Americans, including PDE8B (rs2046045 p = 0.03, β = -0.09), VEGFA (rs11755845 p = 0.01, β = -0.13), and NFIA (rs334699 p = 1.50×10-3, β = -0.17). We found little evidence that SNPs previously associated with other thyroid-related disorders were associated with serum TSH levels in this study. These results support the previously reported association between PDE8B and serum TSH levels in European Americans and emphasize the need for additional genetic studies in more diverse populations.
PLoS ONE 12/2014; 9(12):e111301. DOI:10.1371/journal.pone.0111301 · 3.23 Impact Factor