ArticlePDF Available

Genome-wide association study of alcohol consumption and use disorder in 274,424 individuals from multiple populations


Abstract and Figures

Alcohol consumption level and alcohol use disorder (AUD) diagnosis are moderately heritable traits. We conduct genome-wide association studies of these traits using longitudinal Alcohol Use Disorder Identification Test-Consumption (AUDIT-C) scores and AUD diagnoses in a multi-ancestry Million Veteran Program sample (N = 274,424). We identify 18 genome-wide significant loci: 5 associated with both traits, 8 associated with AUDIT-C only, and 5 associated with AUD diagnosis only. Polygenic Risk Scores (PRS) for both traits are associated with alcohol-related disorders in two independent samples. Although a significant genetic correlation reflects the overlap between the traits, genetic correlations for 188 non-alcohol-related traits differ significantly for the two traits, as do the phenotypes associated with the traits’ PRS. Cell type group partitioning heritability enrichment analyses also differentiate the two traits. We conclude that, although heavy drinking is a key risk factor for AUD, it is not a sufficient cause of the disorder.
This content is subject to copyright. Terms and conditions apply.
Genome-wide association study of alcohol
consumption and use disorder in 274,424
individuals from multiple populations
Henry R. Kranzler 1,2, Hang Zhou 3,4, Rachel L. Kember1,2, Rachel Vickers Smith 2,5, Amy C. Justice 3,4,6,
Scott Damrauer1,2, Philip S. Tsao7,8, Derek Klarin 9, Aris Baras10, Jeffrey Reid 10, John Overton10,
Daniel J. Rader1, Zhongshan Cheng3,4, Janet P. Tate3,4, William C. Becker3,4, John Concato3,4,KeXu
Renato Polimanti3,4, Hongyu Zhao 3,6 & Joel Gelernter 3,4
Alcohol consumption level and alcohol use disorder (AUD) diagnosis are moderately heri-
table traits. We conduct genome-wide association studies of these traits using longitudinal
Alcohol Use Disorder Identication Test-Consumption (AUDIT-C) scores and AUD diag-
noses in a multi-ancestry Million Veteran Program sample (N=274,424). We identify 18
genome-wide signicant loci: 5 associated with both traits, 8 associated with AUDIT-C only,
and 5 associated with AUD diagnosis only. Polygenic Risk Scores (PRS) for both traits are
associated with alcohol-related disorders in two independent samples. Although a signicant
genetic correlation reects the overlap between the traits, genetic correlations for 188 non-
alcohol-related traits differ signicantly for the two traits, as do the phenotypes associated
with the traitsPRS. Cell type group partitioning heritability enrichment analyses also dif-
ferentiate the two traits. We conclude that, although heavy drinking is a key risk factor for
AUD, it is not a sufcient cause of the disorder.
Corrected: Author correction;Author correction OPEN
1University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA. 2Crescenz Veterans Affairs Medical Center, Philadelphia, PA 19104,
USA. 3Yale School of Medicine, New Haven, CT 06511, USA. 4Veterans Affairs Connecticut Healthcare System, West Haven, CT 06516, USA. 5University of
Louisville School of Nursing, Louisville, KY 40202, USA. 6Yale School of Public Health, New Haven, CT 06511, USA. 7VA Palo Alto Health Care System, Palo
Alto, CA 94304, USA. 8Stanford University School of Medicine, Stanford, CA 94305, USA. 9Massachusetts General Hospital, Harvard Medical School,
Boston, MA 02114, USA. 10 Regeneron Genetics Center, Tarrytown, NY 10591, USA. These authors contributed equally: Henry R. Kranzler, Hang Zhou,
Rachel L. Kember. Correspondence and requests for materials should be addressed to H.R.K. (email:
NATURE COMMUNICATIONS | (2019) 10:1499 | /10.1038 /s41467-019-09480-8 | /naturecommunications 1
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Excessive alcohol consumption is associated with a host of
adverse medical, psychiatric, and social consequences.
Globally, in 2012, about 3.3 million or 5.9% of all deaths,
139 million disability-adjusted life years, and 5.1% of the burden
of disease and injury were attributable to alcohol consumption,
with the magnitude of harm determined by the volume of alcohol
consumed and the drinking pattern1. Regular heavy drinking is
the major risk factor for the development of an alcohol use dis-
order (AUD), a chronic, relapsing condition characterized by
impaired control over drinking2. Independent of AUD, heavy
drinking has a multitude of adverse medical consequences.
Identifying factors that contribute to drinking level and AUD risk
could advance efforts to prevent, identify, and treat both medical
and psychiatric problems related to alcohol.
Many different alcohol-related phenotypes have been used to
investigate genetic risk, including formal diagnoses, such as
alcohol dependence [e.g., based on the Diagnostic and Statistical
Manual of Mental Disorders, 4th edition (DSM-IV)3] and
screening tests that measure alcohol consumption and alcohol-
related problems [e.g., the Alcohol Use Disorders Identication
Test (AUDIT)]. The AUDIT, a 10-item, self-reported test
developed by the World Health Organization as a screen for
hazardous and harmful drinking4,5has been used for genome-
wide association studies (GWASs) both as a total score68and as
the AUDIT-Consumption (AUDIT-C) and AUDIT-Problems
(AUDIT-P) sub-scores8. The three-item AUDIT-C measures the
frequency and quantity of usual drinking and the frequency of
binge drinking, while the 7-item AUDIT-P measures alcohol-
related problems.
Twin and adoption studies have shown that half of the risk of
alcohol dependence, a subtype of AUD, is heritable9. The single-
nucleotide polymorphism (SNP) heritability of alcohol depen-
dence in a family-based, European-American (EA) sample was
16%10 and 22% in an unrelated African-American (AA) sam-
ple11. In the meta-analysis of data from the UK Biobank (UKBB)
and 23andMe, the SNP heritability of the total AUDIT was
estimated to be 12%, while for the AUDIT-C and AUDIT-P it
was 11% and 9%, respectively).
In 12 GWASs of alcohol dependence (most of which used a
binary DSM-IV diagnosis3) published between 2009 and 2014
(ref. 12), the only consistent genome-wide signicant (GWS)
ndings were for SNPs in genes encoding the alcohol metabo-
lizing enzymes. Similarly, in a recent meta-analysis of 14,904
individuals with alcohol dependence and 37,944 controls, which
was stratied by genetic ancestry (European, N=46,568; African;
N=6280), the only GWS ndings were two independent ADH1B
variants. In addition, there were signicant genetic correlations
seen with 17 phenotypes, including psychiatric (e.g., schizo-
phrenia, depression), substance use (e.g., smoking and cannabis
use), social (e.g., socio-economic deprivation), and behavioral
(e.g., educational attainment) traits13.
Alcohol-metabolizing enzyme genes have also been associated
with mean or maximal alcohol consumption levels, potential
intermediate phenotypes for alcohol dependence1419. In a meta-
analysis of GWASs (N> 105,000 European subjects), KLB was
associated with alcohol consumption20. A GWAS of alcohol
consumption in the UK Biobank sample21 identied GWS asso-
ciations at 14 loci (8 independent), including three alcohol-
metabolizing genes on chromosome 4 (ADH1B,ADH1C, and
ADH5), an intergenic SNP on chromosome 4, and KLB, repli-
cating the prior meta-analytic ndings. Risk genes identied in
this study included GCKR,CADM2, and FAM69C.
A GWAS of the AUDIT in nearly 8000 individuals failed to
identify any GWS loci6. A GWAS of the AUDIT from 23andMe
in 20,328 European ancestry participants also failed to yield GWS
results7, although meta-analysis of the AUDIT in the UKBB and
23andMe samples identied 10 associated risk loci, including
associations to JCAD and SLC39A13 (ref. 8). In addition to the
total AUDIT-C score, the meta-analysis included GWASs for the
AUDIT-C and AUDIT-P, which showed signicantly different
patterns of association across a number of traits, including psy-
chiatric disorders. Specically, the direction of genetic correla-
tions between schizophrenia, major depressive disorder, and
obesity (among others) was negative for AUDIT-C and positive
for AUDIT-P.
In the present study, we evaluate the independent and over-
lapping genetic contributions to AUDIT-C and AUD in a single
large multi-ancestry sample from the Million Veteran Program
(MVP)22. Large-scale biobanks such as the MVP offer the
potential to link genes to health-related traits documented in the
electronic health record (EHR) with greater statistical power than
can ordinarily be achieved in prospective studies23. Such dis-
coveries improve our understanding of the etiology and patho-
physiology of complex diseases and their prevention and
treatment. To that end, we use a common data sourcelong-
itudinal repeated measures of alcohol-related traits from the
national Veterans Health Administration (VHA) EHRto obtain
the mean, age-adjusted AUDIT-C score and International Clas-
sication of Diseases (ICD) alcohol-related diagnosis codes over
more than 11 years of care24. We then conduct a GWAS of each
trait followed by downstream analysis of the ndings in which we
construct Polygenic Risk Scores (PRS) for both traits and show
that they are associated with alcohol-related disorders in two
independent samples. The availability of data on alcohol con-
sumption from the AUDIT-C and a formal diagnosis of AUD
from the EHR enables us to examine the relationship between
these key alcohol-related traits in a single, well-phenotyped
sample and to compare the ndings for these traits more sys-
tematically than has previously been possible.
Principal components analysis. We differentiated participants
genetically into ve populations (see Methods, Supplementary
Fig. 1) and removed outliers. There was a high degree of con-
cordance (Supplementary Fig. 2) between the genetically dened
populations and the self-reported groups for European Amer-
icans (EAs, 95.6% were self-reported Non-Hispanic white) and
African Americans (AAs, 94.5% were self-reported Non-Hispanic
black). Concordance ranged from 53.1% to 81.6% in the other
three population groups.
GWAS analyses. The GWAS for AUDIT-C (Fig. 1a, Table 1and
Supplementary Table 1 and Supplementary Data 1) identied 13
independent loci in EAs, 2 in AAs, 1 in LAs (Hispanic and Latino
Americans), and 1 in EAAs (East Asian Americans) (Supple-
mentary Figs. 4, 5). Meta-analysis across the ve populations (see
Methods) also yielded 13 independent loci, 5 of which were
previously associated with a self-reported measure of alcohol
consumption: GCKR21,KLB20,21,ADH1B18,21,ADH1C21, and
SLC39A8 (ref. 8). The eight trans-population signals for AUDIT-
C identied here include VRK2 (Vaccinia related kinase 2),
DCLK2 (Doublecortin like kinase 2), ISL1 (ISL LIM Homeobox
1), FTO (Alpha-Ketoglutarate Dependent Dioxygenase), IGF2BP1
(Insulin like growth factor 2 MRNA binding protein 1), PPR1R3B
(Protein phosphatase 1 regulatory subunit 3B), BRAP (BRCA1
associated protein), BAHCC1 (BAH domain and coiled-coil
containing 1), and RBX1 (Ring-box 1). BAHCC1 and RBX1 were
GWS only in the trans-population meta-analysis, the results of
which were driven largely by the ndings in EAs, who comprised
73.5% of the total MVP sample.
2NATURE COMMUNICATIONS | (2019) 10:1499 | /10.1038/s41467-019-09480-8 | www.nature.c om/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
The GWAS for AUD (Fig. 1b, Table 2and Supplementary
Tables 1 and Supplementary Data 2) identied 10 independent
loci in EAs, 2 in AAs, and 2 in LAs (Supplementary Figs. 6, 7).
Meta-analysis across the ve populations yielded 10 independent
loci, including 3 previously associated with alcohol dependence25
ADH1B,ADH1C, and ADH4and 7 loci not previously
associated with an AUD diagnosis: GCKR,SIX3 (SIX Homeobox
3), SLC39A8,DRD2 (Dopamine Receptor D2: rs4936277 and
rs61902812, which were independent), chr10q25.1 (rs7906104),
and FTO. Five loci were signicant in both the AUDIT-C and
AUD GWASs (Supplementary Fig. 8): ADH1B,ADH1C,FTO,
GCKR, and SLC39A8. The trans-population GWS ndings for
AUD are also driven largely by the ndings in EAs.
The GWAS ndings largely reect male-specic signals due to
the predominantly male sample (Supplementary Table 1). How-
ever, sex-stratied GWAS also identied two female-specic
signals for AUDIT-C (Supplementary Data 3, Supplementary
Figs. 9, 10) and one for AUD (Supplementary Data 4,
Supplementary Figs. 11, 12).
For AUDIT-C, when associations for the seven LD-pruned
GWS SNPs on chromosome 4q23q24 in EAs are conditioned
on rs1229984, the most signicant functional SNP in the
region in that population, the only independent signal (using a
Bonferroni-corrected pvalue < 0.05) is for rs1229978, near
ADH1C (Supplementary Data 5). For AUD, when associations
for the four LD-pruned GWS SNPs in EAs are conditioned on
GWS in EA and meta-analysis
GWS in AA and meta-analysis
GWS in LA and meta-analysis
GWS in EA only
GWS in AA only
GWS in LA only
GWS in ESA only
[ ]
[ ]
[ ]
–log10 p val
–log10 p val
2 3 4 5 6 7 8 9 10 11 12 13 14 1516 18 20 22
10 11 12 13 14 15 16 18 20 22
Fig. 1 Manhattan plots for age-adjusted mean AUDIT-C score and AUD diagnosis. aManhattan plot of the genome-wide association meta-analysis of AUDIT-C
across all ve populations (N=272,842). bManhattan plot of the genome-wide association meta-analysis of AUD across ve populations (55,584 cases and
218,807 controls). Red lines show the genome-wide signicance level (5.0 × 108). EA: European American, AA: African American, LA: Hispanic or Latino, EAA:
East Asian American, SAA: South Asian American. Labeled genes at the top of the peaks indicate completely independent signals after conditional analysis in
meta-analysis. Population-specic loci are labeled at the bottom of the circles in the lower part of each gure. []: no genes within 500 kb to the lead SNP
NATURE COMMUNICATIONS | (2019) 10:1499 | /10.1038 /s41467-019-09480-8 | /naturecommunications 3
Content courtesy of Springer Nature, terms of use apply. Rights reserved
rs1229984, the only independent signal in that region is for
rs1154433 near ADH1C. In the trans-population meta-analysis,
rs5860563 is independent when conditioned on rs1229984 in
EAs, and on rs2066702 in AAs, the most signicant functional
SNP in the region in AAs (Supplementary Data 6).
To elucidate further the genetic differences between AUDIT-C
and AUD, we conducted a GWAS of each phenotype with
the other phenotype as a covariate. A GWAS of AUDIT-C
with AUD as a covariate identied 10 GWS loci in EAs and 2
GWS loci in AAs (Supplementary Data 7). In both EAs and AAs,
all loci overlapped with the GWS ndings for AUDIT-C alone. A
GWAS of AUD that included AUDIT-C as a covariate identied
ve GWS loci in EAs and one in AAs (Supplementary Data 8).
Among EAs, four of the loci were the same as for AUD, the only
non-overlapping nding being DIO1 (Iodothyronine Deiodinase
1). In AAs, ADH1B remained signicant for AUD when
accounting for AUDIT-C, but TSPAN5 did not.
Using a sign test, most SNPs have the same direction of effect
for AUDIT-C and AUD, consistent with the high genetic
correlation between the traits. For SNPs with pvalue <1 × 106
the sign concordance between the two traits is 98.7% in EAs and
100% in the other four, smaller population groups.
Body mass index-adjusted GWAS.BecauseFTO was GWS for
both AUDIT-C and AUD, we repeated the two GWASs correcting
AUDIT-C and AUD, most remain GWS after correction for BMI,
though the signicancelevelofsomechange(SupplementaryData9,
10). FTO SNPs become only nominally signicant for both alcohol-
related traits: the pvalue for the lead SNP for AUDIT-C, rs9937709,
decreases in signicance from 5.53 × 1014 to 1.42 × 105and for
the lead SNP for AUD, rs11075992, it decreases in signicance from
3.22 × 1010 to 3.02 × 105. In contrast, with correction for BMI,
some signals increase, e.g., for rs1260326 in GCKR the pvalue
increases in signicance from p=2.04 × 1016 to p=2.91 × 1019
for AUDIT-C and from p=2.27 × 1013 to p=1.71 × 1014 for
AUD. Similarly, rs1229984 in ADH1B increases in signicance from
p=3.62 × 10133 to p=9.81 × 10145 for AUDIT-C and from p=
4.68 × 1085 to p=3.85 × 1089 for AUD.
Gene-based analyses. For AUDIT-C score, gene-based association
analyses identify 31 genes in EAs that are GWS (p< 2.69 × 106),
3 in AAs, 1 in LAs, and 2 in EAAs (Supplementary Fig. 13),
including many of the loci in the SNP-based analyses for that trait.
The unique genes in EAs include C4orf17,ZNF512,MTTP,TBCK,
and MCC. For AUDIT-C, the loci that were not GWS in the SNP-
based analyses included EIF4E in AAs, MAP2 in LAs, and LOX
and MYL2 in EAAs.
For AUD, we identify 23 GWS genes in EAs, 5 in AAs, and 1 in
LAs (Supplementary Fig. 14), many of which are GWS loci in the
SNP-based analyses for that trait. For AUD, the loci in EAs that
are not GWS in the SNP-based analyses are KRTCAP3,
ADH4, and METAP1 are GWS for AUD, while ADGRB2 is the
only GWS locus in LAs.
Table 1 Genome-wide signicant associations for AUDIT-C in the trans-population meta-analysis
rsID Chr:posaA1/A2 GenebEAF N Z-score P_EA P_AA P_LA P_EAA P_SAA Effect P_ meta
rs1260326 2:27730940 C/T GCKRc0.652 270,226 8.22 1.74 × 1016 0.110 0.067 0.987 0.739 +++++ 2.04 × 1016
rs2683616 2:58035555 A/G VRK2d0.624 211,399 6.22 1.80 × 109NA 0.060 0.487 NA +?+? 4.95 × 1010
rs12639940 4:39420981 A/G KLBc0.613 194,761 5.93 3.45 × 109NA NA 0.626 NA +??+? 3.06 × 109
rs1229984 4:100239319 C/T ADH1Bc0.970 272,358 24.56 4.83 × 10102 1.31 × 1019 4.40 × 1016 9.05 × 103NA ++++? 3.62 × 10133
rs142783062 4:100270960 D/I ADH1Cc0.345 271,444 9.82 2.04 × 1014 4.75 × 1074.90 × 1040.019 0.779 +++++ 9.50 × 1023
rs13107325 4:103188709 C/T SLC39A8c0.937 270,248 11.45 1.43 × 1025 1.07 × 1042.23 × 103NA NA +++?? 2.24 × 1030
rs4423856 4:150984857 T/C DCLK2d0.796 212,444 5.66 3.60 × 108NA 0.289 0.574 0.144 +?+++ 1.48 × 108
rs2961816 5:50443691 A/C ISL1d0.683 260,828 5.74 1.24 × 1070.021 0.641 0.932 0.137 +++++ 9.75 × 109
rs4841132 8:9183596 A/G PPP1R3Bd0.101 271,192 5.51 2.75 × 1066.59 × 1030.226 NA 0.148 −−−?+3.62× 108
rs62033408 16:53827962 A/G FTOc0.678 270,067 9.08 2.20 × 1015 4.78 × 1050.229 0.177 0.027 +++++ 1.11 × 1019
rs9902512 17:47094274 C/G IGF2BP1c0.664 207,229 5.81 3.81 × 108NA 0.055 0.782 NA ?−−? 6.24 × 109
rs142997686 17:79419159 D/I BAHCC1c0.384 211,314 5.84 1.77 × 109NA 0.944 0.434 0.840 +?+5.39 × 109
rs75723348 22:41420679 T/G RBX1d0.736 269,785 5.71 2.97 × 1070.063 0.072 0.536 0.579 +++++ 1.11 × 108
The loci shown represent completely independent signals after conditioning analyses
A1 effect allele, A2 other allele, EAF effective allele frequency, EA European American, AA African American, LA Latino American, EAA East Asian American, SAA South Asian American
aHuman Genome hg19 assembly
bGene nearest to the lead SNP
cProtein-coding gene contains the lead SNP
dProtein-coding gene nearest to the lead SNP
Table 2 Genome-wide signicant associations for AUD in the trans-population meta-analysis
rsID Chr:posaA1/A2 GenebEAF N Z-score P_EA P_AA P_LA P_EAA P_SAA Effect P_ meta
rs1260326 2:27730940 C/T GCKRc0.651 271,763 7.33 1.44 × 1016 0.679 0.778 0.830 0.820 ++++2.27 × 1013
rs540606 2:45138507 A/G SIX3d0.409 213,336 6.49 2.84 × 1010 NA 0.175 0.411 NA ?−−? 8.58 × 1011
rs5860563 4:100047157 D/I ADH4c0.723 271,487 6.09 7.63 × 1059.85 ×1070.035 NA 0.412 −−−?+1.12 × 109
rs1229984 4:100239319 C/T ADH1Bc0.969 273,904 19.54 4.51 × 1074 4.18 × 1055.81 × 1017 0.032 NA ++++? 4.68 × 1085
rs1612735 4:100258007 T/C ADH1Cc0.656 271,471 8.86 1.75 × 1014 6.42 × 1050.022 0.938 0.054 −−−++ 7.90 × 1019
rs13107325 4:103188709 C/T SLC39A8c0.937 271,784 7.60 2.73 × 1014 0.064 0.363 NA NA +++?? 2.97 × 1014
rs7906104 10:110497101 T/C 0.272 270,278 5.92 3.15 × 1078.72 × 1030.357 0.195 0.106 −−−−− 3.17 × 109
rs61902812 11:113374420 A/C DRD2d0.304 271,218 5.58 4.99 × 1060.025 0.015 0.220 0.931 −−−−− 2.44 × 108
rs4936277e11:113431960 A/G DRD2d0.599 274,128 7.44 2.85 × 1011 0.073 4.36 × 1040.200 0.357 +++++ 1.01 × 1013
rs1421085 16:53800954 T/C FTOc0.670 274,340 6.69 3.26 × 1010 0.024 0.525 0.332 0.018 +++++ 2.17 × 1011
The loci shown represent completely independent signals after conditioning analyses
A1 effect allele, A2 other allele, EAF effective allele frequency, EA European American, AA African American, LA Latino American, EAA East Asian American, SAA South Asian Americans
aHuman Genome hg19 assembly
bGene nearest to the lead SNP
cProtein-coding gene contains the lead SNP
dProtein-coding gene nearest to the lead SNP
eDifferent signal than rs61902812
4NATURE COMMUNICATIONS | (2019) 10:1499 | /10.1038/s41467-019-09480-8 | www.nature.c om/naturecommunications
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Pathway and biological enrichment analyses. Using Functional
Mapping and Annotation (FUMA)26 software to investigate the
pathway or biological process enrichment with summary statistics
as input and false discovery rate (FDR) correction for multiple
testing, we nd multiple reactome and Kyoto Encyclopedia of
Genes and Genomes (KEGG) pathways that are signicantly
enriched for AUDIT-C (Supplementary Data 11, Supplementary
Fig. 15) and AUD (Supplementary Data 12, Supplementary
Fig. 16) in each population. The most signicant pathway is
reactome ethanol oxidation for both traits in both EAs and AAs.
Multiple GO biological processes are enriched for AUDIT-C
(Supplementary Data 13, Supplementary Fig. 17) and AUD
(Supplementary Data 14, Supplementary Fig. 18), including
ethanol and alcohol metabolism. Enrichments for chemical and
genetic perturbation gene sets and for the GWAS catalog for both
traits are shown in Supplementary Data 1518 and Supplemen-
tary Figs. 1922.
Heritability estimates. We use linkage disequilibrium score
regression (LDSC)27 (see Methods) to estimate SNP-based her-
itability (h
) in EAs and AAs, where sample sizes are large
enough to provide robust estimates for each trait (Fig. 2a). For
AUDIT-C, the h
is 0.068 (s.e. =0.005) in EAs: 0.068 (s.e. =
0.005) in males and 0.099 (s.e. =0.037) in females. In AAs, the h
is 0.062 (s.e. =0.016): 0.058 (s.e. =0.018) in males. For AUD,
the h
is 0.056 (s.e. =0.004) in EAs: 0.054 (s.e. =0.004) in
males and 0.110 (s.e. =0.038) in females. The h
for AUD is
Adrenal or pancreas
Connective or bone
Skeletal muscle
Immune or hematopoietic
Alcohol dependence
PGC cross–disorder analysis
Depressive symptoms
Bipolar disorder
Ever vs. never smoked
Former vs. current smoker
Years of schooling 2016
Age of first birth
Age at menarche
HDL cholesterol
Fasting insulin main effect
Type 2 diabetes
Coronary artery disease
Obesity class 3
Obesity class 1
Obesity class 2
Body fat
Body mass index
Waist circumference
Waist–to–hip ratio
Hip circumference
Childhood obesity
Mothers age at death
Extreme BMI
EA male
LA male
EA male
AA male
LA male
AA male
LA all
EA female
EA female
AA all
EA all
AA all
LA all
–log10 p val
–log10 p val
–log10 p val –log 10 p val
Fig. 2 Heritability estimate, partitioning enrichments of heritability, and genetic correlation analyses using LD score regression. aSNP-based heritability for
AUDIT-C and AUD in the three populations and sex-stratied samples adequate in size for the analysis. bPartitioned heritability enrichment of cell type
groups for AUDIT-C and AUD. Ten cell types tested were corrected for multiple testing. The black dashed line is the cutoff for Bonferroni-corrected
signicance. The gray dashed line is the cutoff for FDR < 0.05. cGenetic correlations with other traits. Data from 714 publicly available datasets (221
published and 493 unpublished from UK Biobank) were tested and corrected for multiple comparisons. The signicantly correlated traits presented are for
published data. Black lines are the cutoff for Bonferroni-corrected signicance, with asterisks showing traits signicant after correction. The traits are
grouped into different categories and sorted by the genetic correlations with AUDIT-C (upper panel) or AUD (lower panel). CNS central nervous system,
ADHD attention decit hyperactivity disorder, MDD major depressive disorder
NATURE COMMUNICATIONS | (2019) 10:1499 | /10.1038 /s41467-019-09480-8 | /naturecommunications 5
Content courtesy of Springer Nature, terms of use apply. Rights reserved
0.100 (s.e. =0.022) in AAs: 0.104 (s.e. =0.023) in males. Robust
estimates of h
are unavailable in AA and LA females due to
the small sample size.
In the analysis of stratied heritability enrichment using
LDSC28 (see Methods), several cell line functional enrichments
were signicant (FDR < 0.05) for AUDIT-C (Supplementary
Data 19) and AUD (Supplementary Data 20). Cell type group
partitioning heritability enrichment analyses indicated that
central nervous system (CNS) was the most signicant cell type
for AUDIT-C (Fig. 2b, upper panel; Supplementary Data 21) and
the only signicant cell type for AUD (Fig. 2b, bottom panel;
Supplementary Data 22). Enrichments for AUDIT-C were also
detected for cardiovascular, adrenal or pancreatic, skeletal muscle,
other, and liver cell types in descending order of signicance. We
also tested the heritability enrichments using data from gene
expression and chromatin to identity disease-related tissues or
cell types29 (Supplementary Data 2332). We found a few
epigenetic features in brain tissuese.g., H3K4me1, H3K4me3,
and DNasethat were signicantly enriched for each trait.
Genetic correlations. We estimated the genetic correlation (r
between different datasets or populations using LDSC30. The r
between AUDIT-C and AUD was 0.522 (s.e. =0.038, p=2.40 ×
1042) in EAs and 0.930 (s.e. =0.122, p=1.85 × 1014) in AAs
(Supplementary Data 33). The r
between EA males and EA
females was 0.815 (s.e. =0.156, p=1.69 × 107) for AUDIT-C
and 0.833 (s.e. =0.142, p=4.16 × 109) for AUD.
After Bonferroni correction, 179 traits or diseases were
genetically correlated with AUDIT-C (Fig. 2c, upper panel;
Supplementary Data 34). AUDIT-C was positively genetically
correlated with lipids (e.g., HDL cholesterol concentration: r
0.361, p=3.39 × 108), reproductive traits (e.g., age at menarche:
0.190, p=4.20 × 108), and years of education (r
=0.248, p=
1.40 × 1016) and negatively correlated with anthropometric (e.g.,
BMI: r
=0.350, p=3.25 × 1019), cardiometabolic (e.g., cor-
onary artery disease: r
=0.212, p=8.28 × 108), glycemic (e.g.,
Type 2 diabetes: r
=0.273, p=2.34 × 107), lipid (e.g.,
triglyceride concentration: r
=0.325, p=1.29 × 1010), and
psychiatric (e.g., major depressive disorder (MDD) (r
p=7.72 × 108) traits. After correction, 111 traits or diseases
were genetically associated with AUD (Fig. 2c bottom panel;
Supplementary Data 35), including positive genetic correlations
with sleep disturbance (e.g., insomnia: r
=0.280, p=7.43 ×
106), ever having smoked (r
=0.581, p=9.19 × 1020), and
multiple psychiatric disorders (e.g., alcohol dependence: r
0.965, p=1.21 × 1010; MDD: r
=0.406, p=2.19 × 1020), and
negative genetic correlations with aging-related factors (e.g.,
mothers age at death: r
=0.390, p=8.09 × 108), intelligence
=0.226, p=6.79 × 108), years of education (r
p=2.88 × 1015), and quitting smoking (r
=0.517, p=1.12 ×
We tested the difference between genetic correlations for
AUDIT-C and AUD using a two-tailed z-test. After correction for
714 tested traits, the genetic correlations for 188 traits showed
signicant differences between the two alcohol-related traits
(Supplementary Data 36). We explored trait and disease
associations for AUDIT-C-adjusted for AUD and AUD-
adjusted for AUDIT-C, and found that the genetic correlations
between the alcohol-related traits and other phenotypes did not
differ substantially from the unadjusted ones (Supplementary
Data 37, 38). Additionally, we explored genetic correlations
for AUDIT-C-adjusted for BMI (Supplementary Data 39) and
AUD-adjusted for BMI (Supplementary Data 40). Most of the
genetic correlations for AUDIT-C-adjusted for BMI did not differ
substantially from the unadjusted ones, except for
anthropometric traits, where the negative correlation was
attenuated (although still signicant). Signicant genetic correla-
tions for AUD-adjusted for BMI did not differ substantially from
those for AUD alone. We also explored prior GWAS associations
for the GWS SNPs from AUDIT-C and AUD analyses and found
associations with other phenotypes for ve of them (Supplemen-
tary Data 41).
Polygenic Risk Scores. We examined PRS generated from the
AUDIT-C and AUD GWASs in three samples (Supplementary
Figs. 2326). First, in a hold-out MVP sample of EAs and AAs
(described in Methods), AUDIT-C and AUD PRS were sig-
nicantly associated with both AUDIT-C and AUD phenotypes
(Supplementary Data 42, 43). Lower pvalue thresholds of
AUDIT-C PRS were associated with AUDIT-C score and AUD
diagnosis codes, with the most signicant being 1 × 107(EA
AUDIT-C: β=0.088, p=1.43 × 1044; EA AUD: β=0.137, p=
3.03 × 1030; AA AUDIT-C: β=0.094, p=2.82 × 1017;AA
AUD: β=0.110, p=1.3 × 1010). All pvalue thresholds for AUD
PRS were associated with both AUDIT-C score and AUD diag-
nosis codes, with the most signicant being 1 × 107for EAs
(AUDIT-C: β=0.095, p=8.98 × 1051; AUD: β=0.147, p=
6.02 × 1034) and 1 × 107for AAs (AUDIT-C: β=0.066, p=
3.69 × 109; AUD: β=0.098, p=1.09 × 109).
Second, in an independent sample from the Penn Medicine
BioBank, AUDIT-C and AUD PRS were signicantly associated
with alcohol-related disorders and alcoholism phecodes (see
Methods and Supplementary Data 44 and 45). In EAs, higher
AUDIT-C risk scores signicantly increased the likelihood of
alcohol-related disorders and alcoholism at multiple pvalue
thresholds, with the most signicant being 1 × 107(β=0.278,
p=0.0013) and 1 × 106(β=0.245, p=0.0074), respectively. In
AAs, at a pvalue threshold of 1 × 107, AUDIT-C risk scores
were non-signicantly associated with risk of alcohol-related
disorders (β=0.210, p=0.064) but signicantly associated with
alcoholism (β=0.400, p=0.0051). In both populations, AUD
risk scores were signicantly associated with both alcohol-related
disorders and alcoholism. In EAs, the most signicant pvalue
threshold was 1 × 104(alcohol-related disorders: β=0.254, p=
0.0006; alcoholism: β=0.229, p=0.0062), while in AAs, the
most signicant pvalue threshold was 1 × 106(alcohol-
related disorders: β=0.306, p=0.006; alcoholism: β=0.440,
Third, in the Yale-Penn study sample25, an independent
sample ascertained for substance use disorders, the PRS of
AUDIT-C and AUD were signicantly associated with DSM-IV
alcohol dependence criterion counts (see Methods and Supple-
mentary Data 46, 47). In EAs, all AUDIT-C risk scores were
signicantly associated with the criterion count, with the most
signicant pvalue threshold being 1 × 107(β=1.029, p=
6.67 × 1013). Similarly, all AUD risk scores were signicantly
associated with the criterion count, the most signicant pvalue
threshold being 1 × 106(β=1.144, p=1.86 × 1016). In AAs,
all but one AUDIT-C risk score and all AUD risk scores were
signicantly associated with the alcohol dependence criterion
count, the most signicant pvalue threshold being 1 × 107
(AUDIT-C: β=0.829, p=1.18 × 1011; AUD: β=0.502, p=
4.62 × 108).
Secondary phenotypic associations. To identify secondary phe-
notypes associated with AUDIT-C or AUD, we performed a
phenome-wide association analysis (PheWAS) of the AUDIT-C
and AUD PRS (pvalue threshold =1×10
7and all SNPs) in the
MVP hold-out sample (Supplementary Data 48, 49). In EAs, the
AUDIT-C PRS was signicantly associated with an increased risk
6NATURE COMMUNICATIONS | (2019) 10:1499 | 480-8 |
Content courtesy of Springer Nature, terms of use apply. Rights reserved
of alcoholic liver damage, and nominally associated with a
decreased risk of hyperglyceridemia. No signicant associations
were found for AAs. The AUD PRS was signicantly associated
with an increased risk of tobacco use disorder in both EAs and
AAs, and in EAs with multiple psychiatric disorders, including
major depression, bipolar disorder, anxiety, and schizophrenia.
We report here a GWAS of two alcohol-related traits in a sample
of 274,424 MVP participants from ve population groupsEA,
AA, LA, EAA, and SAAusing two EHR-derived phenotypes:
age-adjusted AUDIT-C score and AUD diagnostic codes. In
addition to the large number of EAs, the study included large
numbers of African-American and Latino-American participants.
Trans-population meta-analyses identied 13 independent GWS
loci for AUDIT-C and 10 independent GWS loci for AUD. For
AUDIT-C, in addition to the loci identied in the SNP-based
analyses, there were 31 GWS genes in EAs, 3 in AAs, 1 in LAs,
and 2 in EAAs. For AUD, in addition to the loci identied in the
SNP analyses, there were 23 GWS genes identied in EAs, 5 in
AAs, and 1 in LAs.
Using both AUDIT-C scores and AUD diagnoses enabled us to
examine the relations between these key alcohol-related traits.
The ndings underscore the utility of using an intermediate trait,
such as alcohol consumption, for genetic discovery. Five of the 13
loci associated with AUDIT-C score, a measure of alcohol con-
sumption, including the two most commonly identied alcohol
metabolism genes (ADH1B and ADH1C) and three highly
pleiotropic genes (GCKR,SLC39A8, and FTO), contributed to
AUD risk. Of the 10 loci that were GWS for AUD, half also were
associated with AUDIT-C score, while half were uniquely asso-
ciated with the AUD diagnosis: ADH4,SIX3, a variant on
chr10q25.1 and 2 variants in DRD2.
In addition to multiple overlapping variants for AUDIT-C and
AUD, we found a moderate-to-high genetic correlation between
the traits: 0.522 in EAs and 0.930 in AAs. There are two potential
explanations for the population difference in genetic correlation.
First, it may reect a bias in the assignment of AUD diagnoses by
clinicians (e.g., in the context of a high AUDIT-C score, clinicians
could be less likely to assign an AUD diagnosis to EAs than AAs,
reducing the genetic correlation). Second, because LD structure in
admixed populations is complex, LD score regression could have
inated the genetic correlation among AAs, an admixed popu-
lation. Another factor relevant to this difference is the smaller
number of AAs, which despite a higher r
, yielded a larger
standard error. The genetic similarity between these alcohol-
related traits is consistent with twin studies of alcohol dependence
and alcohol consumption31,32. These ndings are also consistent
with the PRS analyses in the MVP sample, where both AUDIT-C
and AUD PRS were associated with AUDIT-C and AUD phe-
notypes. Both traits also predicted multiple alcohol-related phe-
notypes in independent datasets, including alcohol dependence
criteria in the Yale-Penn sample. However, there was a smaller
effect of AUDIT-C PRS scores than AUD PRS scores on alcohol-
related disorders and alcohol dependence. This is in line with
ndings from the meta-analysis of UKBB and 23andMe data,
where the genetic correlation with alcohol dependence was
nominally greater for AUDIT-P scores (r
=0.63) than AUDIT-C
scores (r
Despite the signicant genetic overlap between the AUDIT-C
and AUD diagnosis, downstream analyses revealed biologically
meaningful points of divergence. The AUDIT-C yielded
some GWS ndings that did not overlap with those for AUD,
which reects genetic independence of the traits. This broadens
our previous observations using SNPs in ADH1B, in which we
validated the AUDIT-C score as an alcohol-related phenotype33.
In that study, after accounting for the effects of AUDIT-C score,
AUD diagnoses accounted for unique variance in the frequency
of ADH1B minor alleles.
Evidence of genetic independence between the two traits was
most striking in the differences between the genetic correlation
analyses. After correction, genetic correlations for 188 traits dif-
fered signicantly (some in opposite directions) between AUDIT-
C and AUD. Notably, these included a negative association of
AUDIT-C with anthropometric traits, including BMI; coronary
artery disease; and glycemic traits, including Type 2 diabetes. The
negative genetic correlation with coronary artery disease is con-
sistent with some epidemiological ndings that alcohol con-
sumption protects against some forms of cardiovascular disease34.
AUDIT-C was positively genetically correlated with overall health
rating, HDL cholesterol concentration, and years of education,
ndings that are consistent with prior literature showing genetic
correlation of these traits with alcohol consumption7,8,21. AUD
was signicantly genetically correlated with 111 traits or diseases,
including negative genetic correlations with intelligence, years of
education and quitting smoking, and positive genetic correlations
with insomnia, ever having smoked and most psychiatric dis-
orders, ndings that are consistent with phenotypic associations
in the epidemiological literature3537 and genetic correlations
reported from the UKBB and 23andMe GWASs and their meta-
analysis7,8,21. The opposite genetic correlations seen for some
traits may be driven by low-effect variants, as we nd close to
100% consistency in the direction of effect for the most sig-
nicantly associated SNPs for both AUDIT-C and AUD. Further,
in the MVP sample, the AUD PRS was signicantly positively
associated with tobacco use and multiple psychiatric disorders,
whereas the AUDIT-C PRS was not. Taken together, these
ndings suggest that AUD and alcohol consumption, measured
by AUDIT-C, are related but distinct phenotypes, with AUD
being more closely related to other psychiatric disorders, and
AUDIT-C to some positive health outcomes.
Although the protective effects of moderate drinking are con-
troversial, we found that alcohol consumption in the absence of
genetic risk for AUD may protect from cardiovascular disease,
diabetes mellitus, and major depressive disorder. In contrast,
individuals with genetic risk for AUD are at elevated risk for
some adverse secondary phenotypes, including insomnia, smok-
ing, and other psychiatric disorders. However, individuals who
have had health problems resulting from drinking are more likely
to reduce or stop drinking by middle age or under-report their
alcohol consumption. This offers an alternative explanation for
the opposite genetic associations38, particularly in an older clin-
ical sample in which a large proportion report current abstinence
(reected in an AUDIT-C score of 0). For this complex set of
genetic associations to be useful in informing clinical recom-
mendations on safe levels of alcohol consumption, it will be
necessary to elucidate the mechanisms underlying these ndings.
Both phenotypes showed cell type-specic enrichments for
CNS. Other relevant cell types for AUDIT-C, but not for AUD,
included cardiovascular, adrenal or pancreas, liver, and muscu-
loskeletal. Thus, although heavy drinking is prerequisite to the
development of AUD, the latter is a polygenic disorder and
variation in genes expressed in the CNS (e.g., DRD2) may be
necessary for individuals who drink heavily to develop AUD. As a
binary trait, AUD provided less statistical power to identify
genetic variation than the ordinal AUDIT-C score, but the mul-
tiple GWS ndings unique to AUD argue against that as an
explanation for the non-overlapping GWS ndings for the two
The VHA EHR provided a rich source of phenotypic data.
These included mean age-adjusted AUDIT-C scores, which are
NATURE COMMUNICATIONS | (2019) 10:1499 | | 7
Content courtesy of Springer Nature, terms of use apply. Rights reserved
more stable than measures at a single point in time (more likely
reecting traits rather than states) and contrast with meta-
analytic studies that may use phenotypes reecting the lowest-
common denominator among the studies comprising the sample.
However, our analyses were limited by our reliance on the
AUDIT-C, which includes only the rst 3 of the 10 AUDIT items.
We also obtained cumulative AUD diagnoses, which are also
more informative than assessments obtained at a single time
point. Because the diagnosis of AUD is based on features other
than alcohol consumption per se2,5, use of the AUD diagnosis
from the EHR augmented the information provided by the
AUDIT-C phenotype. Although EHR diagnostic data are het-
erogeneous, large-scale biobanks such as the MVP yield greater
statistical power to link genes to health-related traits repeatedly
documented over time in the EHR than can ordinarily be
achieved in prospective studies23, justifying the lower resolution
of EHR data. However, because the MVP sample is pre-
dominantly comprised of EA males, statistical power was limited
in both the GWAS and the post-GWAS analyses of other
populations and some female samples. Future studies with larger
sample sizes are needed to identify additional variation con-
tributing to these alcohol-related traits and to elucidate their
The SNP heritability of our GWASs was lower than that seen
in the meta-analysis of the UKBB and 23andMe data8. For the
AUDIT-C, the estimated SNP heritability was 0.068 in EAs (0.068
in males and 0.099 in females) and 0.062 in AAs. For AUD, the
estimated SNP heritability was 0.056 in EAs (0.054 in males and
0.110 in females) and 0.100 in AAs. These estimates may reect
the lower number of SNPs tested in our sample compared with
the meta-analysis of UKBB and 23andMe data. The nominally
higher SNP heritability in females than males could be due to the
substantially smaller size of the female subsample. Alternatively,
women could have a higher liability-threshold and therefore a
higher burden of risk variants. Because our study sample was
predominantly male, we do not have adequate statistical power to
evaluate these hypotheses. Although we found no signicant
difference in PRS between males and females, because of
the substantially smaller number of women in MVP, there is
much less power for the PRS in this subgroup and for comparing
the PRS by sex.
Despite these limitations, the large, diverse, and similarly
ascertained sample enabled us to identify multiple GWS ndings
for both AUDIT-C score and AUD diagnosis, and thereby to help
elucidate the relationship between drinking level and AUD risk.
The large sample provided high power for PRS analyses in other
samples, as demonstrated here in the Penn Medicine Biobank and
Yale-Penn samples. The genetic differences between the two
alcohol-related traits and the observed opposite genetic correla-
tions between them point to potentially important differences in
comorbidity and prognosis. Our ndings underscore the need to
identify the functional effects of the risk variants, especially where
they diverge by trait, to elucidate the nature of the trait-related
differences. Focusing on variants linked to AUD, but not AUDIT-
C, could identify targets for the development of medications to
treat the disorder, while variation in AUDIT-C could help in
developing interventions to reduce drinking and thereby prevent
the morbidity associated with it. The ndings reported here could
also help to identify individuals at high risk of AUD through the
use of PRS. This effort could be augmented using knowledge of
the full set of phenotypes that associate with AUD through the
use of genetic correlations and PheWASs.
Data collection. The MVP is an observational cohort study and biobank supported
by the U.S. Department of Veterans Affairs (VA). Phenotypic data were collected
from MVP participants using questionnaires and the VA EHR and a blood sample
was obtained for genetic analysis.
Ethics statement: The Central Veterans Affairs Institutional Review Board (IRB)
and site-specic IRBs approved the MVP study. All relevant ethical regulations for
work with human subjects were followed in the conduct of the study and informed
consent was obtained from all participants.
Phenotypes. AUDIT-C scores and AUD diagnostic codes were obtained from the
VA EHR. The AUDIT-C comprises the rst three items of the AUDIT and
measures typical quantity (item 1) and frequency (item 2) of drinking and fre-
quency of heavy or binge drinking (item 3). The AUDIT-C is a mandatory annual
assessment for all veterans seen in primary care. Our analyses used AUDIT-C data
collected from 1 October 2007 to 23 February 2017. We validated the phenotype in
a sample of 1851 participants from the Veterans Aging Cohort Study33, in which
we found a highly signicant association of AUDIT-C scores with the plasma
concentration of phosphatidylethanol, a direct, quantitative biomarker that is
correlated with the level of alcohol consumption. In the AA part of this sample
(n=1503), the AUDIT-C score was highly signicantly associated with rs2066702,
a missense (Arg369Cys) polymorphism of ADH1B, the minor allele of which is
common in that population and has been associated with alcohol dependence25.
We also examined AUDIT-C scores in 167,721 MVP participants (57,677 AAs and
110,044 EAs)24, comparing the association of AUDIT-C scores and AUD diagnoses
with the frequency of the minor allele of rs2066702 in AAs and rs1229984
(Arg48His) in EAs. Both polymorphisms exert large effects on alcohol metabo-
lism39 and are among the genetic variants associated most consistently with
alcohol-related traits in both AAs and EAs8,12,18. In both populations, we found a
stronger association between age-adjusted mean AUDIT-C score and ADH1B
minor allele frequency than between AUD diagnostic codes and the frequency of
the minor alleles24. However, because AUD diagnoses accounted for unique var-
iance in the frequency of the minor alleles in both populations, we concluded that
the two phenotypes, although correlated, are distinct traits. Thus, in the present
study, we used GWAS to examine these traits separately and to adjust for the
effects of AUD in the AUDIT-C GWAS and the effects of AUDIT-C in the GWAS
of AUD.
We calculated the age-adjusted mean AUDIT-C value24 for each participant
using age 50 as the reference point and down-weighting scores for individuals
younger than 50 and up-weighting scores for individuals older than 50. The age-
adjusted mean AUDIT-C was computed using a sample of 495,178 participants
with data on age and AUDIT-C, of whom 272,842 had genetic data and were
included in the AUDIT-C genetic analyses.
The principal classes of alcohol-related disorders in the ICD are alcohol abuse
and alcohol dependence. We used ICD-9 codes 303.X (dependence) and
305305.03 (abuse) and ICD-10 codes F10.1 (abuse) and F10.2 (dependence) to
identify subjects diagnosed with either of these disorders, as suggested previously40
(see Supplementary Table 2). Participants with at least one inpatient or two
outpatient alcohol-related ICD-9/10 codes (N=274,391) from 2000 to 2018 were
assigned a diagnosis of AUD, an approach that has been shown to yield greater
specicity of ICD codes than chart review41.
Genotyping and imputation. MVP GWAS genotyping was performed using an
Affymetrix Axiom Biobank Array with 686,693 markers. Subjects or SNPs with
genotype call rate <0.9 or high heterozygosity were removed, leaving
353,948 subjects and 657,459 SNPs for imputation22.
Imputation was performed with EAGLE2 (ref. 42) to pre-phase each
chromosome and Minimac3 (ref. 43) to impute genotypes with 1000 Genomes
Project phase 3 data44 as the reference panel. Subjects with no demographic
information or whose genotypic and phenotypic sex did not match were removed.
We also removed one subject randomly from each pair of related individuals
(kinship coefcient threshold =0.0884). A greedy algorithm was implemented for
network-like relationships among three or more individuals, leaving
331,736 subjects for subsequent analyses.
Population differentiation. To differentiate population groups, we performed
principal components analysis (PCA) using common SNPs (MAF > 0.05) shared in
MVP [pruned using linkage disequilibrium (LD) of r2> 0.2] and the 1000 Genomes
phase 3 reference panels for European (EUR), African (AFR), admixed American
(AMR), East Asian (EAS), and South Asian (SAS) populations using FastPCA in
EIGENSOFT45. We analyzed 80,871 SNPs in MVP and 1000 Genomes for use in
the PCA analyses. The Euclidean distances between each participant and the
centers of the ve reference populations (i.e., across all subjects) were calculated
using the rst 10 PCs, with each participant assigned to the nearest reference
population. A total of 242,317 EA; 61,762 AA; 15,864 Hispanic and Latino
American (LA); 1565 East Asian American (EAA); and 228 South Asian American
(SAA) subjects were identied. A second PCA (within each group) yielded the rst
10 PCs for each. Participants with PC scores >3 standard deviations from the mean
of any of the 10 PCs were removed as outliers, leaving 209,020 EA; 57,340 AA;
14,425 LA; 1410 EAA; and 196 SAA subjects. Within genetically dened popula-
tions, we calculated population-specic imputation INFO scores using SNPTEST
v2 (ref. 46) and retained SNPs with INFO scores >0.7 for association analyses.
8NATURE COMMUNICATIONS | (2019) 10:1499 | 480-8 |
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Imputed genotypes with posterior probability 0.9 were transferred to best guess.
We removed both genotyped and imputed SNPs with genotype call rates or best
guess rates 0.95 and HWE pvalue 1×10
6in each population, using different
MAF thresholds to lter SNPs: EA (0.0005), AA (0.001), LA (0.01), EAA (0.05),
and SAA (0.05). The approximate number of SNPs remaining in each population
was EA: 6.8 million, AA: 12.5 million, LA: 5.6 million, EAA: 2.6 million, and SA:
2.6 million.
Genome-wide association analyses. Ind ividuals <22 or > 90 years old and those
with missing AUDIT-C scores were removed from the analyses, leaving 200,680
EAs; 56,495 AAs; 14,112 LAs; 1366 EAAs; and 189 SAAs in the AUDIT-C GWAS
and 202,004 EAs (34,658 cases; 167,346 controls); 56,648 AAs (17,267 cases; 39,381
controls); 14,175 LAs (3449 cases; 10,726 controls); 1374 EAAs (164 cases; 1210
controls); and 190 SAs (44 cases; 144 controls) in the AUD GWAS. We used linear
regression for the GWAS of age-adjusted mean AUDIT-C score and logistic
regression for AUD diagnosis; in both cases age, sex, and the rst 10 PCs were
covariates. To evaluate the impact on AUD ndings of controlling for AUDIT-C
and the impact on AUDIT-C ndings of controlling for AUD, we repeated the
GWAS for AUD with AUDIT-C as a covariate and AUDIT-C with AUD as a
covariate. For both phenotypes, following GWAS in each of the ve populations,
the summary statistics were combined within phenotype in trans-population meta-
analyses. SNPs in EAs or those present in at least two populations were meta-
analyzed. Sex-stratied GWAS for both phenotypes were then performed in groups
large enough to permit itEA, AA, LA, and EAA men and EA, AA, and LA
womenand the data were meta-analyzed within sex and phenotype. All meta-
analyses were performed using a sample-size-weighted scheme that was imple-
mented in METAL47.
To identify independent signals in each population, we performed LD clumping
using PLINK v1.90b4.4 (ref. 48). We identied an index SNP (p<5×10
8) with
the smallest pvalue in a 500-kb genomic window and r2< 0.1 with other index
SNPs. Because in EAAs there is extended linkage disequilibrium at the ALDH2
locus, we used a 2500-kb window in that population. In the chr4q23q24 region,
where we identied multiple apparently independent signals for both AUDIT-C
and AUD, we used conditional associations to differentiate independent signals
from partially overlapping ones.
Gene-based association analysis. Gene-based association analysis was performed
using Multi-marker Analysis of GenoMic Annotation (MAGMA)49, which uses a
multiple regression approach to detect multi-marker effects that account for SNP
pvalues and LD between markers. We used the default setting (no window around
genes) to consider 18,575 autosomal genes for the analysis, with p< 2.69 × 106
(0.05/18,575) considered GWS. For each population, we used the respective
population from the 1000 Genomes Project phase 3 as the LD reference.
Enrichment analyses. Pathway and biological enrichment analyses were per-
formed for each population using the FUMA platform26, with independent sig-
nicant SNPs identied using the default settings. Positional gene mapping
identied genes up to 10 kb from each independent signicant SNP. Hypergeo-
metric tests were used to examine the enrichment of prioritized chemical and
genetic perturbation gene sets, canonical pathways, and GO biological processes
(obtained from MsigDB c2), and GWAS-catalog enrichment (obtained from
reported genes from the GWAS-catalog). We report all signicantly enriched gene
sets based on an FDR-adjusted pvalue <0.05.
Heritability and partitioning of heritability. LDSC27 was used to calculate
population-specic LD scores based on 1000 Genomes phase 3 datasets according
to the LDSC tutorial, using SNPs selected from HapMap 3 (ref. 50) after excluding
the major histocompatibility complex (MHC) region (chr6: 2634Mb); only
ancestry groups with large sample size (N> 10,000) were analyzed using LDSC. Of
note, LDSC could be biased in admixed populations because reference panels are
not provided for AAs and LAs in that application27. We calculated LD scores for
1,215,001 SNPs in EAs; 1,322,841 SNPs in AAs; and 1,243,726 SNPs in LAs. The
LDSC analyses used SNPs with imputation INFO 0.9 in each population and that
were LD scored in 1000 Genomes. LD score regression intercepts for available
datasets were estimated to distinguish polygenic heritability from ination. SNP-
based heritability (h
) was estimated from GWAS summary statistics for both
AUDIT-C and AUD. The sex-specich
was also estimated in EA males and
females, AA males, and LA males.
We estimated partitioned h
using genomic features or functional
categories28 for both AUDIT-C and AUD in the largest dataset, EAs, and then
tested for enrichment of the partitioned h
in different annotations. First, we
used a baseline model consisting of 53 functional categories, including UCSC gene
models [exons, introns, promotors, untranslated regions (UTRs)], ENCODE
functional annotations51, Roadmap epigenomic annotations52, and FANTOM5
enhancers53. We then analyzed cell type-specic annotations and identied
enrichments of h
in 10 cell types, including adrenal and pancreas, CNS,
cardiovascular, connective tissue and bone, gastrointestinal, immune and
hematopoietic, kidney, liver, skeletal muscle and other. Gene expression and
chromatin data were also analyzed to identify disease-relevant tissues, cell types,
and tissue-specic epigenetic annotations. We used LDSC to test for enriched
heritability in regions surrounding genes with the highest tissue-specic expression
or with epigenetic marks29. Sources of data that were analyzed included 53 human
tissue or cell type RNA-seq data from the Genotype-Tissue Expression Project
(GTEx)54; 152 human, mouse, or rat tissue or cell type array data from the Franke
lab;55 3 sets of mouse brain cell type array data from Cahoy et al.56; 292 mouse
immune cell type array data from ImmGen57; and 396 human epigenetic
annotations (6 features in 88 primary cell types or tissues) from the Roadmap
Epigenomics Consortium52. In the analysis of each trait in each dataset, we used
FDR < 0.05 to indicate signicant enrichment for the h
Genetic correlations. We estimated the genetic correlation (r
) between AUDIT-C
and AUD (from MVP), and with other traits in LD Hub58 or from published
studies using LDSC, which is robust to sample overlap30. First, we estimated the r
between AUDIT-C and AUD using the summary data generated in this study. AAs,
EAs, and LAs were analyzed separately using the corresponding 1000 Genome
phase 3 population as reference. Genetic correlations in EAA and SAA were not
analyzed. The r
between AUDIT-C and AUD was out of bounds because the
did not differ signicantly from zero. Then we tested the r
between males
and females within each trait. We estimated the r
for AUDIT-C and AUD with
216 published traits in LD Hub and 493 unpublished traits from the UK Biobank.
We resolved redundancy in phenotypes by manually selecting the published ver-
sion of the phenotype or using the largest sample size. We also calculated the
genetic correlations of both AUDIT-C and AUD with ve traits for which GWAS
were recently published or posted, including anorexia nervosa59, alcohol depen-
dence13, attention decit hyperactivity disorder60, autism spectrum disorder61, and
major depressive disorder (summary data without the 23andMe sample)62,
bringing the total number of tested traits to 714. A Bonferroni correction was
applied separately for AUDIT-C and AUD, and traits with a corrected pvalue
<0.05 were considered signicantly correlated. Because the results were similar
whether the intercept was constrained or not, we present here the original results
without constraint.
Polygenic Risk Scores. To generate PRS from GWAS summary statistics in the
MVP sample, we rst conducted a GWAS for AUDIT-C and AUD as above, but
restricted our analysis to two-thirds of the total sample by splitting the total sample
randomly, keeping the number of AUD cases/controls balanced in each part (EA:
N=139,346, AA: N=38,226). GWS loci identied from this analysis were the
same as those in the larger sample, but were slightly decreased in signicance. PRS
were generated for the remaining EAs (N=69,674) and AAs (N=19,114) as the
sum of all variants carried, weighted by the effect size of the variant in the GWAS.
PRS were generated using PLINK2 (ref. 63). We performed pvalue informed
clumping with a distance threshold of 250 kb and r2=0.1. Risk scores were cal-
culated for a range of pvalue thresholds (P1×10
3, 0.01, 0.05, 0.5, 1.0). PRS were standardized with mean =0 and SD =1.
Logistic regression was used to test for association with AUDIT-C and AUD
phenotypes, with PRS as the independent variable and AUDIT-C or AUD as the
dependent variable, with age, sex, and the rst ve PCs as covariates.
Population-specic summary statistics from the AUDIT-C and AUD GWAS in
MVP were used to generate PRS in the PMBB, an independent sample. PRS were
generated for EAs (N=8524) and AAs (N=2031) as above using the PRSice2
package64 with imputed allele dosage as the target dataset. As recommended in the
software, we performed pvalue informed clumping with a distance threshold of 250
kb and r2=0.1. We excluded the MHC region. Risk scores were calculated for a range
of pvalue thresholds (p1×10
3, 0.01, 0.05,
0.5, 1.0) and standardized with mean =0 and SD =1. To identify individuals with
alcohol-related disorders, we utilized phecodes, a method to aggregate ICD codes65.
First, we extracted ICD-9 and ICD-10 data for 48,610 individuals from the EHR.
To facilitate mapping to phecodes, ICD-10 codes were back converted to ICD-9
using 2017 general equivalency mapping (GEM). The ICD-10 conversions were
combined with the ICD-9 codes to create a dataset with 10,682 unique ICD-9
codes. ICD-9 codes were aggregated to phecodes using the PheWAS R package65 to
create 1812 phecodes. Individuals are considered cases for the phenotype if they
had at least two instances of the phecode, controls if they had no instance of the
phecode, and other/missing if they had one instance or a related phecode. Logistic
regression was used to test the association of the PRS with the alcohol-related
disorders phecode (phecode number 317) and its sub-phenotype, alcoholism
(phecode number 317.1). The analysis was performed in R with PRS as the
independent variable and diagnosis as the dependent variable and age, sex, and the
rst 10 PCs as covariates.
We also tested the PRS of AUDIT-C and AUD for DSM-IV alcohol dependence
criterion counts in the Yale-Penn cohort25. There are three phases of the Yale-Penn
sample: phase 1 contains 3110 AAs and 1718 EAs exposed to alcohol; phase 2
contains 1667 AAs and 1689 EAs exposed to alcohol; phase 3 contains 556 AAs and
999 EAs exposed to alcohol. PRS were generated for EAs and AAs in each phase as
described above and risk scores were calculated for a range of pvalue thresholds
3, 0.01, 0.05, 0.5, 1.0). Different
from PRS in MVP and PMBB, to correct for the relatedness in the Yale-Penn
subjects, a linear mixed model implemented in GEMMA66 was used to test the
association between PRS score and DSM-IV alcohol dependence criterion counts,
NATURE COMMUNICATIONS | (2019) 10:1499 | | 9
Content courtesy of Springer Nature, terms of use apply. Rights reserved
with age, sex, and the rst 10 PCs as covariates. Meta-analyses of data from the three
phases were performed in AAs (N=5333) and EAs (N=4406) separately.
Phenome-wide association analysis. To conduct PheWAS, we extracted ICD-9
data from the EHR for 353,323 genotyped veterans. Of these, 277,531 individuals
had two or more separate encounters in the VA Healthcare System in each of the 2
years prior to enrollment in MVP, consisting of 21,209,658 records. ICD-9 codes
were aggregated to phecodes using the PheWAS R package to create 1812 phe-
codes. To improve the specicity of these codes, individuals with at least two
instances of the phecode were considered cases, those with no instance of the
phecode controls, and those with one instance of a phecode or a related phecode as
other. A PheWAS using logistic regression models with either AUDIT-C or AUD
PRS as the independent variable, phecodes as the dependent variables, and age, sex
and the rst ve PCs as covariates were used to identify secondary phenotypic
associations. A phenome-wide signicance threshold of 2.96 × 105was applied to
account for multiple testing.
Secondary GWAS Adjusted for BMI: As described below, for both alcohol-
related traits, we identied a GWS SNP in FTO, variation in which has been
associated with BMI and risk of obesity67. To examine whether BMI confounded
the association with this and other loci and the genetic correlations with other
traits, we repeated the GWAS for AUDIT-C and AUD using BMI as an additional
covariate. Data on BMI were from the MVP baseline survey and the EHR. For
AUDIT-C, 200,092 EAs; 56,239 AAs; 14,029 LAs; 1352 EAAs; and 185 SAAs had
BMI data available. For AUD, 201,320 EAs; 56,347 AAs; 14,075 LAs; 1360 EAAs;
and 186 SAAs had BMI data available. After GWAS, we analyzed the genetic
correlations between BMI-adjusted traits and other publicly available traits (N=
714), with Bonferroni correction for multiple testing.
Reporting summary. Further information on experimental design is available in
the Nature Research Reporting Summary linked to this article.
Data availability
The full summary-level association data from the meta-analysis for each of the two
alcohol-related traits from this report are available through dbGaP: [https://www.ncbi. =phs001672.v1.p1] (accession
number phs001672.v1.p1). Further information on research design is available in the
Nature Research Reporting Summary linked to this article. All other data are contained
within the article and its supplementary information are available upon reasonable
request from the corresponding author.
Received: 19 December 2018 Accepted: 6 March 2019
1. World Health Organization. Global Status Report on Alcohol and Health 2018
(WHO, Geneva, 2018).
2. American Psychiatric Association. Diagnostic and Statistical Manual of Mental
Disorders 5th edn (American Psychiatric Association, Arlington, VA, 2013).
3. American Psychiatric Association. Diagnostic and Statistical Manual of
Mental Disorders: DSM-IV-TR (American Psychiatric Association,
Washington, DC, 2000).
4. Saunders, J. B., Aasland, O. G., Babor, T. F., de la Fuente, J. R. & Grant, M.
Development of the Alcohol Use Disorders Identication Test (AUDIT):
WHO collaborative project on early detection of persons with harmful alcohol
consumptionII. Addiction 88, 791804 (1993).
5. Babor, T. F., Higgins-Biddle, J. C., Saunders, J. B. & Monteiro, M. G., World
Health Organization, Dependence DoMHaS. AUDIT: The Alcohol Use
Disorders Identication Test: Guidelines for Use in Primary Health Care 2nd
edn (World Health Organization, Geneva, 2001).
6. Mbarek, H. et al. The genetics of alcohol dependence: twin and SNP-based
heritability, and genome-wide association study based on AUDIT scores. Am.
J. Med Genet B Neuropsychiatr. Genet.168, 739748 (2015).
7. Sanchez-Roige, S. et al. Genome-wide association study of alcohol use disorder
identication test (AUDIT) scores in 20 328 research participants of European
ancestry. Addict. Biol. 24, 121131 (2019).
8. Sanchez-Roige, S. et al. Genome-wide association study meta-analysis of the
Alcohol Use Disorders Identication Test (AUDIT) in two population-based
cohorts. Am. J. Psychiatry 176, 107118 (2019).
9. Verhulst, B., Neale, M. C. & Kendler, K. S. The heritability of alcohol use
disorders: a meta-analysis of twin and adoption studies. Psychol. Med. 45,
10611072 (2015).
10. Vrieze, S. I., McGue, M., Miller, M. B., Hicks, B. M. & Iacono, W. G. Three
mutually informative ways to understand the genetic relationships among
behavioral disinhibition, alcohol use, drug use, nicotine use/dependence, and
their co-occurrence: twin biometry, GCTA, and genome-wide scoring. Behav.
Genet. 43,97107 (2013).
11. Yang, C. et al. Exploring the genetic architecture of alcohol dependence in
African-Americans via analysis of a genomewide set of common variants.
Hum. Genet.133, 617624 (2014).
12. Hart, A. B. & Kranzler, H. R. Alcohol dependence genetics: lessons learned
grom genome-wide association studies (GWAS) and post-GWAS analyses.
Alcohol Clin. Exp. Res.39, 13121327 (2015).
13. Walters, R. K. et al. Transancestral GWAS of alcohol dependence reveals
common genetic underpinnings with psychiatric disorders. Nat. Neurosci. 21,
16561669 (2018).
14. Schumann, G. et al. Genome-wide association and genetic functional studies
identify autism susceptibility candidate 2 gene (AUTS2) in the regulation of
alcohol consumption. Proc. Natl Acad. Sci. USA 108, 71197124 (2011).
15. Takeuchi, F. et al. Conrmation of ALDH2 as a major locus of drinking
behavior and of its variants regulating multiple metabolic phenotypes in a
Japanese population. Circ. J. 75, 911918 (2011).
16. Kapoor, M. et al. A meta-analysis of two genome-wide association studies to
identify novel loci for maximum number of alcoholic drinks. Hum. Genet.
132, 11411151 (2013).
17. Quillen, E. E. et al. ALDH2 is associated to alcohol dependence and is the
major genetic determinant of daily maximum drinksin a GWAS study of an
isolated rural Chinese sample. Am. J. Med. Genet. B Neuropsychiatr. Genet.
165B, 103110 (2014).
18. Xu, K. et al. Genomewide association study for maximum number of alcoholic
drinks in European Americans and African Americans. Alcohol Clin. Exp. Res.
39, 11371147 (2015).
19. Gelernter, J. et al. Genomewide association study of alcohol dependence and
related traits in a Thai population. Alcohol Clin. Exp. Res. 42, 861868 (2018).
20. Schumann, G. et al. KLB is associated with alcohol drinking, and its gene
product beta-Klotho is necessary for FGF21 regulation of alcohol preference.
Proc. Natl Acad. Sci. USA 113, 1437214377 (2016).
21. Clarke, T. K. et al. Genome-wide association study of alcohol consumption
and genetic overlap with other health-related traits in UK Biobank (N =112
117). Mol. Psychiatry 22, 13761384 (2017).
22. Gaziano, J. M. et al. Million Veteran Program: a mega-biobank to study genetic
inuences on health and disease. J. Clin. Epidemiol. 70,214223 (2016).
23. Collins, R. What makes UK Biobank special? Lancet 379, 11731174 (2012).
24. Justice, A. C. et al. AUDIT-C and ICD codes as phenotypes for harmful
alcohol use: association with ADH1B polymorphisms in two US populations.
Addiction 113, 22142224 (2018).
25. Gelernter, J. et al. Genome-wide association study of alcohol dependence:
signicant ndings in African- and European-Americans including novel risk
loci. Mol. Psychiatry 19,41
49 (2014).
26. Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functional
mapping and annotation of genetic associations with FUMA. Nat. Commun.
8, 1826 (2017).
27. Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from
polygenicity in genome-wide association studies. Nat. Genet.47, 291295
28. Finucane, H. K. et al. Partitioning heritability by functional annotation using
genome-wide association summary statistics. Nat. Genet. 47, 12281235
29. Finucane, H. K. et al. Heritability enrichment of specically expressed genes
identies disease-relevant tissues and cell types. Nat. Genet. 50,621629 (2018).
30. Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases
and traits. Nat. Genet. 47, 12361241 (2015).
31. Grant, J. D. et al. Alcohol consumption indices of genetic risk for alcohol
dependence. Biol. Psychiatry 66, 795800 (2009).
32. Kendler, K. S., Myers, J., Dick, D. & Prescott, C. A. The relationship between
genetic inuences on alcohol dependence and on patterns of alcohol
consumption. Alcohol Clin. Exp. Res.34, 10581065 (2010).
33. Justice, A. C. et al. Validating harmful alcohol use as a phenotype for genetic
discovery using phosphatidylethanol and a polymorphism in ADH1B. Alcohol
Clin. Exp. Res. 41, 9981003 (2017).
34. Wood, A. M. et al. Risk thresholds for alcohol consumption: combined
analysis of individual-participant data for 599 912 current drinkers in 83
prospective studies. Lancet 391, 15131523 (2018).
35. Chou, S. P. et al. Alcohol use disorders, nicotine dependence, and co-
occurring mood and anxiety disorders in the United States and South Korea-a
cross-national comparison. Alcohol Clin. Exp. Res. 36, 654662 (2012).
36. Lai, H. M., Cleary, M., Sitharthan, T. & Hunt, G. E. Prevalence of comorbid
substance use, anxiety and mood disorders in epidemiological surveys, 1990-
2014: a systematic review and meta-analysis. Drug Alcohol Depend. 154,113
37. Kendler, K. S., Ohlsson, H., Sundquist, J. & Sundquist, K. School achievement,
IQ, and risk of alcohol use disorder: a prospective, co-relative analysis in a
Swedish national cohort. J. Stud. Alcohol Drugs 78, 186194 (2017).
10 NATURE COMMUNICATIONS | (2019) 10:1499 | 480-8 |
Content courtesy of Springer Nature, terms of use apply. Rights reserved
38. Eyawo, O. et al. Alcohol and mortality: combining self-reported (AUDIT-C)
and biomarker detected (PEth) alcohol measures among HIV infected and
uninfected. J. Acquir. Immune Dec. Syndr. 77, 135143 (2018).
39. Polimanti, R. & Gelernter, J. ADH1B: from alcoholism, natural selection, and
cancer to the human phenome. Am. J. Med Genet. B Neuropsychiatr. Genet.
177, 113125 (2018).
40. Piette, J. D., Barnett, P. G. & Moos, R. H. First-time admissions with alcohol-
related medical problems: a 10-year follow-up of a national sample of
alcoholic patients. J. Stud. Alcohol 59,8996 (1998).
41. Justice, A. C. et al. Medical disease and alcohol use among veterans with
human immunodeciency infection: a comparison of disease measurement
strategies. Med. Care 44, S52S60 (2006).
42. Loh, P. R. et al. Reference-based phasing using the Haplotype Reference
Consortium panel. Nat. Genet.48, 14431448 (2016).
43. Browning, B. L. & Browning, S. R. Genotype imputation with millions of
reference samples. Am. J. Hum. Genet.98, 116126 (2016).
44. 1000 Genomes Project Consortium, et al. A global reference for human
genetic variation. Nature 526,6874 (2015).
45. Galinsky, K. J. et al. Fast principal-component analysis reveals convergent
evolution of ADH1B in Europe and East Asia. Am. J. Hum. Genet.98,
456472 (2016).
46. Marchini, J. & Howie, B. Genotype imputation for genome-wide association
studies. Nat. Rev. Genet. 11, 499511 (2010).
47. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efcient meta-analysis
of genomewide association scans. Bioinformatics 26, 21902191 (2010).
48. Purcell, S. et al. PLINK: a tool set for whole-genome association and
population-based linkage analyses. Am. J. Hum. Genet.81, 559575 (2007).
49. de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized
gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
50. Consortium, InternationalHapMap et al. Integrating common and rare
genetic variation in diverse human populations. Nature 467,5258 (2010).
51. Encode Project Consortium. An integrated encyclopedia of DNA elements in
the human genome. Nature 489,5774 (2012).
52. Roadmap Epigenomics Consortium. et al. Integrative analysis of 111 reference
human epigenomes. Nature 518, 317330 (2015).
53. Andersson, R. et al. An atlas of active enhancers across human cell types and
tissues. Nature 507, 455461 (2014).
54. GTEx Consortium. Human genomics. The Genotype-Tissue Expression
(GTEx) pilot analysis: multitissue gene regulation in humans. Science 348,
648660 (2015).
55. Fehrmann, R. S. et al. Gene expression analysis identies global gene dosage
sensitivity in cancer. Nat. Genet. 47, 115125 (2015).
56. Cahoy, J. D. et al. A transcriptome database for astrocytes, neurons, and
oligodendrocytes: a new resource for understanding brain development and
function. J. Neurosci. 28, 264278 (2008).
57. Heng, T. S. & Painter, M. W. Immunological Genome Project C. The
Immunological Genome Project: networks of gene expression in immune cells.
Nat. Immunol. 9, 10911094 (2008).
58. Zheng, J. et al. LD Hub: a centralized database and web interface to perform
LD score regression that maximizes the potential of summary level GWAS
data for SNP heritability and genetic correlation analysis. Bioinformatics 33,
272279 (2017).
59. Duncan, L. et al. Signicant locus and metabolic genetic correlations revealed
in genome-wide association study of anorexia nervosa. Am. J. Psychiatry 174,
850858 (2017).
60. Demontis, D. et al. Discovery of the rst genome-wide signicant risk loci for
attention decit/hyperactivity disorder. Nat. Genet. 51,6375 (2019).
61. Autism Spectrum Disorders Working Group of The Psychiatric Genomics
Consortium. Meta-analysis of GWAS of over 16,000 individuals with autism
spectrum disorder highlights a novel locus at 10q24.32 and a signicant
overlap with schizophrenia. Mol. Autism 8, 21 (2017).
62. Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants
and rene the genetic architecture of major depression. Nat. Genet. 50,
668681 (2018).
63. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger
and richer datasets. Gigascience 4, 7 (2015).
64. Euesden, J., Lewis, C. M. & OReilly, P. F. PRSice: Polygenic Risk Score
software. Bioinformatics 31, 14661468 (2015).
65. Carroll, R. J., Bastarache, L. & Denny, J. C. R. PheWAS: data analysis and
plotting tools for phenome-wide association studies in the R environment.
Bioinformatics 30, 23752376 (2014).
66. Zhou, X. & Stephens, M. Efcient multivariate linear mixed model algorithms
for genome-wide association studies. Nat. Methods 11, 407409 (2014).
67. Loos, R. J. & Yeo, G. S. The bigger picture of FTO: the rst GWAS-identied
obesity gene. Nat. Rev. Endocrinol. 10,5161 (2014).
This research is based on data from the Million Veteran Program (MVP), Ofce of
Research and Development, Veterans Health Administration, and was supported by
award #I01BX003341. This publication does not represent the views of the Department
of Veterans Affairs or the United States Government. A full acknowledgment of the MVP
is included in Supplementary Note 1. We also appreciate access to summary data pro-
vided by the Psychiatric Genomics Consortium (PGC) Substance Use Disorders (SUD)
working group. The PGC-SUD is supported by funds from NIDA and NIMH to
MH109532 and, previously, had analyst support from NIAAA to U01AA008401
(COGA). PGC-SUD gratefully acknowledges its contributing studies and the participants
in those studies without whom this effort would not be possible. Supported by the Mental
Illness Research, Education and Clinical Center of the Veterans Integrated Service
Network 4 of the Department of Veterans Affairs.
Author contributions
H.R.K., J.G., A.C.J., J.C., R.L.K. and H. Zhao designed the study. H.R.K., J.G., A.C.J. and
H. Zhao supervised the work. H. Zhou, R.L.K. and R.V.S conducted the analyses. S.D.,
P.S.T., D.K. and D.J.R. provided phenotypic data and the Regeneron Genetics Center
provided genotypic data for the phenome-wide association analyses. The manuscript was
written by H.R.K., H. Zhou, R.L.K., R.V.S. and J.G., with comments provided by all other
authors. All authors approved the nal version.
Additional information
Supplementary Information accompanies this paper at
Competing interests: H.R.K. is a member of the American Society of Clinical
Psychopharmacologys Alcohol Clinical Trials Initiative, which in the past three years
was supported by AbbVie, Alkermes, Ethypharm, Indivior, Lilly, Lundbeck, Otsuka,
Pzer, Arbor, and Amygdala Neurosciences. H.R.K. and J.G. are named as inventors on
PCT patent application #15/878,640 entitled: Genotype-guided dosing of opioid
agonists,led 24 January 2018. The remaining authors declare no competing interests.
Reprints and permission information is available online at
Peer review information: Nature Communications thanks Meike Bartels and the other
anonymous reviewers for their contribution to the peer review of this work. Peer reviewer
reports are available.
Publishers note: Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional afliations.
Open Access This article is licensed under a Creative Commons Attri-
bution 4.0 International License, which permits use, sharing, adaptation,
distribution and reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative Commons
license, and indicate if changes were made. The images or other third party material in this
article are included in the articles Creative Commons license, unless indicated otherwise in
a credit line to the material. If material is not included in the articlesCreativeCommons
license and your intended use is not permitted by statutory regulation or exceeds the
permitted use, you will need to obtain permission directly from the copyright holder. To
view a copy of this license, visit
This is a U.S. government work and not under copyright protection in the U.S.; foreign
copyright protection may apply 2019
NATURE COMMUNICATIONS | (2019) 10:1499 | | 11
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Terms and Conditions
Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center GmbH (“Springer Nature”).
Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers and authorised users (“Users”), for small-
scale personal, non-commercial use provided that all copyright, trade and service marks and other proprietary notices are maintained. By
accessing, sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of use (“Terms”). For these
purposes, Springer Nature considers academic use (by researchers and students) to be non-commercial.
These Terms are supplementary and will apply in addition to any applicable website terms and conditions, a relevant site licence or a personal
subscription. These Terms will prevail over any conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription
(to the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of the Creative Commons license used will
We collect and use personal data to provide access to the Springer Nature journal content. We may also use these personal data internally within
ResearchGate and Springer Nature and as agreed share it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not
otherwise disclose your personal data outside the ResearchGate or the Springer Nature group of companies unless we have your permission as
detailed in the Privacy Policy.
While Users may use the Springer Nature journal content for small scale, personal non-commercial use, it is important to note that Users may
use such content for the purpose of providing other users with access on a regular or large scale basis or as a means to circumvent access
use such content where to do so would be considered a criminal or statutory offence in any jurisdiction, or gives rise to civil liability, or is
otherwise unlawful;
falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association unless explicitly agreed to by Springer Nature in
use bots or other automated methods to access the content or redirect messages
override any security feature or exclusionary protocol; or
share the content in order to create substitute for Springer Nature products or services or a systematic database of Springer Nature journal
In line with the restriction against commercial use, Springer Nature does not permit the creation of a product or service that creates revenue,
royalties, rent or income from our content or its inclusion as part of a paid for service or for other commercial gain. Springer Nature journal
content cannot be used for inter-library loans and librarians may not upload Springer Nature journal content on a large scale into their, or any
other, institutional repository.
These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not obligated to publish any information or
content on this website and may remove it or features or functionality at our sole discretion, at any time with or without notice. Springer Nature
may revoke this licence to you at any time and remove access to any copies of the Springer Nature journal content which have been saved.
To the fullest extent permitted by law, Springer Nature makes no warranties, representations or guarantees to Users, either express or implied
with respect to the Springer nature journal content and all parties disclaim and waive any implied warranties or warranties imposed by law,
including merchantability or fitness for any particular purpose.
Please note that these rights do not automatically extend to content, data or other material published by Springer Nature that may be licensed
from third parties.
If you would like to use or distribute our Springer Nature journal content to a wider audience or on a regular basis or in any other manner not
expressly permitted by these Terms, please contact Springer Nature at

Supplementary resources (53)

... We selected these two variables as heritable collider variables for three reasons. First, tobacco use and educational attainment are genetically correlated with AUD, suggesting that a PRS for AUD may be associated with tobacco use and educational attainment (Kranzler et al. 2019;Walters et al. 2018;Zhou et al. 2020). Second, preliminary results indicate differing strengths of correlation with polygenic liability for AUD, providing a useful range of circumstances for examining the method. ...
Full-text available
In this study, we test principal component analysis (PCA) of measured confounders as a method to reduce collider bias in polygenic association models. We present results from simulations and application of the method in the Collaborative Study of the Genetics of Alcoholism (COGA) sample with a polygenic score for alcohol problems, DSM-5 alcohol use disorder as the target phenotype, and two collider variables: tobacco use and educational attainment. Simulation results suggest that assumptions regarding the correlation structure and availability of measured confounders are complementary, such that meeting one assumption relaxes the other. Application of the method in COGA shows that PC covariates reduce collider bias when tobacco use is used as the collider variable. Application of this method may improve PRS effect size estimation in some cases by reducing the effect of collider bias, making efficient use of data resources that are available in many studies.
... This motivated an examination of the causal effects of ALCH on the disorders and factors using a form of multi-trait Mendelian randomization (MR) within the genomic SEM framework. We ran two types of MR models: one using the Q SNP variant in the ADH1B gene as a single instrumental variable for ALCH, and a second multivariant MR approach using eight loci identified from an independent ALCH discovery GWAS as instrumental variables 39 ...
Full-text available
We interrogate the joint genetic architecture of 11 major psychiatric disorders at biobehavioral, functional genomic and molecular genetic levels of analysis. We identify four broad factors (neurodevelopmental, compulsive, psychotic and internalizing) that underlie genetic correlations among the disorders and test whether these factors adequately explain their genetic correlations with biobehavioral traits. We introduce stratified genomic structural equation modeling, which we use to identify gene sets that disproportionately contribute to genetic risk sharing. This includes protein-truncating variant-intolerant genes expressed in excitatory and GABAergic brain cells that are enriched for genetic overlap across disorders with psychotic features. Multivariate association analyses detect 152 (20 new) independent loci that act on the individual factors and identify nine loci that act heterogeneously across disorders within a factor. Despite moderate-to-high genetic correlations across all 11 disorders, we find little utility of a single dimension of genetic risk across psychiatric disorders either at the level of biobehavioral correlates or at the level of individual variants. Joint analysis of 11 major psychiatric disorders identifies four broad factor underlying genetic correlations among the disorders. Association analyses detect 152 loci acting on these factors and identify 9 loci that act heterogeneously across disorders.
Full-text available
Nonalcoholic fatty liver disease (NAFLD) is a growing cause of chronic liver disease. Using a proxy NAFLD definition of chronic elevation of alanine aminotransferase (cALT) levels without other liver diseases, we performed a multiancestry genome-wide association study (GWAS) in the Million Veteran Program (MVP) including 90,408 cALT cases and 128,187 controls. Seventy-seven loci exceeded genome-wide significance, including 25 without prior NAFLD or alanine aminotransferase associations, with one additional locus identified in European American-only and two in African American-only analyses (P < 5 × 10⁻⁸). External replication in histology-defined NAFLD cohorts (7,397 cases and 56,785 controls) or radiologic imaging cohorts (n = 44,289) replicated 17 single-nucleotide polymorphisms (SNPs) (P < 6.5 × 10⁻⁴), of which 9 were new (TRIB1, PPARG, MTTP, SERPINA1, FTO, IL1RN, COBLL1, APOH and IFI30). Pleiotropy analysis showed that 61 of 77 multiancestry and all 17 replicated SNPs were jointly associated with metabolic and/or inflammatory traits, revealing a complex model of genetic architecture. Our approach integrating cALT, histology and imaging reveals new insights into genetic liability to NAFLD.
Sleep problems and substance use frequently co‐occur. While substance use can result in specific sleep deficits, genetic pleiotropy could explain part of the relationship between sleep and substance use and use disorders. Here we use the largest publicly available genome‐wide summary statistics of substance use behaviours (N = 79,729–632,802) and sleep/activity phenotypes to date (N = 85,502–449,734) to (1) assess the genetic overlap between substance use behaviours and both sleep and circadian‐related activity measures, (2) estimate clusters from genetic correlations and (3) test processes of causality versus genetic pleiotropy. We found 31 genetic correlations between substance use and sleep/activity after Bonferroni correction. These patterns of overlap were represented by two genetic clusters: (1) tobacco use severity (age of first regular tobacco use and smoking cessation) and sleep health (sleep duration, sleep efficiency and chronotype) and (2) substance consumption/problematic use (drinks per day and cigarettes per day, cannabis use disorder, opioid use disorder and problematic alcohol use) and sleep problems (insomnia, self‐reported short sleep duration, increased number of sleep episodes, increased sleep duration variability and diurnal inactivity) and measures of circadian‐related activity (L5, M10 and sleep midpoint). Latent causal variable analyses determined that horizontal pleiotropy (rather than genetic causality) underlies a majority of the associations between substance use and sleep/circadian related measures, except one plausible genetically causal relationship for opioid use disorder on self‐reported long sleep duration. Results show that shared genetics are likely a mechanism that is at least partly responsible for the overlap between sleep and substance use traits. We explored the genetic correlation and plausible correlation between several sleep and substance use phenotypes using genome‐wide association data and techniques. Sleep and substance use behaviors overlap due to genetic pleiotropy. There are two dimensions of sleep and substance use overlap: a sleep health and tobacco cluster (smoking cessation and initiation with sleep efficiency measures) and a sleep problem and heavy substance use cluster (insomnia, self‐reported sleep duration measures, and drinks per week).
Background and aims: Genomic and transcriptomic findings greatly broaden the biological knowledge regarding substance use. However, systematic convergence and comparison evidence of genome-wide findings is lacking for substance use. Here, we combined all the genome-wide findings from both substance use behavior and disorder (SUBD) and identified common and distinguishing genetic factors for different SUBDs. Methods: Systemic literature search for genome-wide association (GWAS) and RNA-seq studies of alcohol/nicotine/drug use behavior (partially meets or no-reported diagnostic criteria) and disorder (AUBD, NUBD, DUBD) was performed using PubMed and GWAS Catalog. Drug use was focused on cannabis, opioid, cocaine, methamphetamine use. GWAS studies required case-control or case/cohort samples. RNA-seq studies were based on brain tissues. The genes which contained significant single nucleotide polymorphism (p≤1×10-6 ) in GWAS and reported as significant in RNA-seq studies were extracted. Pathway enrichment was performed by using Metascape. Gene interaction networks were identified by using the Protein Interaction Network Analysis database. Results: Total SUBD-related 2910 genes were extracted from 75 GWAS studies (2,773,889 participants) and 17 RNA-seq studies. By overlapped the genes and pathways of AUBD, NUBD, and DUBD, 4 shared genes (CACNB2, GRIN2B, PLXDC2, and PKNOX2), 4 shared pathways (2 GO terms of 'modulation of chemical synaptic transmission', 'regulation of trans-synaptic signaling', 2 KEGG pathways of 'Dopaminergic synapse', 'Cocaine addiction') were identified (significantly higher than random, p<1×10-5 ). The top shared KEGG pathways (B-H corrected p-value<0.05) in the pairwise comparison of AUBD vs. DUBD, NUBD vs. DUBD, AUBD vs. NUBD were "Epstein-Barr virus infection", "protein processing in endoplasmic reticulum", and "neuroactive ligand-receptor interaction" respectively. We also identified substance-specific genetic factors: i.e., ADH1B and ALDH2 were unique for AUBD, while CHRNA3 and CHRNA4 for NUBD. Conclusions: This systematic review identifies the shared and unique genes and pathways for alcohol, nicotine, and drug use behaviors and disorders at the genome-wide level and highlights critical biological processes for the common and distinguishing vulnerability of substance use behaviors and disorders.
Experiences of racial discrimination have been shown to increase risk for alcohol problems. Some individuals may be particularly vulnerable to the negative effects of racial discrimination. However, little research has examined interaction effects between racial discrimination and individual characteristics, such as genetic predispositions and personality, in relation to alcohol outcomes. This study examined whether genetic risk and dimensions of impulsivity moderate the association between racial discrimination and alcohol problems among African American young adults (n = 383, Mage = 20.65, SD = 2.28; 81% female). Participants completed online surveys and provided a saliva sample for genotyping. Results from multiple regression analyses indicated that both blatant and subtle forms of racial discrimination (i.e., experience of racist events and racial microaggressions) were associated with more alcohol problems. Racial microaggressions interacted with dimensions of impulsivity in relation to alcohol problems, such that racial microaggressions were associated with more alcohol problems when negative urgency was high or when sensation seeking was low. There was no significant interaction between alcohol use disorder genome-wide polygenic score and experience of racist events or racial microaggression in relation to alcohol problems, which may partly reflect low power due in part to limited representation of African-Americans in genetic research. The findings highlight the need to increase the representation of African Americans in genetically-informed research in order to better characterize genetic risk and understand gene-environment interaction in this understudied population, as well as the importance of examining impulsivity as a multidimensional construct that interacts with racial discrimination in relation to alcohol outcomes.
Full-text available
Background The integration of multi-omics information (e.g., epigenetics and transcriptomics) can be useful for interpreting findings from genome-wide association studies (GWAS). It has additionally been suggested that multi-omics may aid in novel variant discovery, thus circumventing the need to increase GWAS sample sizes. We tested whether incorporating multi-omics information in earlier and smaller sized GWAS boosts true-positive discovery of genes that were later revealed by larger GWAS of the same/similar traits. Methods We applied ten different analytic approaches to integrating multi-omics data from twelve sources (e.g., Genotype-Tissue Expression project) to test whether earlier and smaller GWAS of 4 brain-related traits (i.e., alcohol use disorder/problematic alcohol use [AUD/PAU], major depression [MDD], schizophrenia [SCZ], and intracranial volume [ICV]) could detect genes that were revealed by a later and larger GWAS. Results Multi-omics data did not reliably identify novel genes in earlier less powered GWAS (PPV<0.2; 80% false-positive associations). Machine learning predictions marginally increased the number of identified novel genes, correctly identifying 1-8 additional genes, but only for well-powered early GWAS of highly heritable traits (i.e., ICV and SCZ). Multi-omics, particularly positional mapping (i.e., fastBAT, MAGMA, and H-MAGMA), was useful for prioritizing genes within genome-wide significant loci (PPVs = 0.5 – 1.0). Conclusions Although the integration of multi-omics information, particularly when multiple methods agree, helps prioritize GWAS findings and translate them into information about disease biology, it does not substantively increase novel gene discovery in brain-related GWAS. To increase power for discovery of novel genes and loci, increasing sample size is a requirement.
Heritability is a fundamental concept in genetic studies, measuring the genetic contribution to complex traits and bringing insights about disease mechanisms. The advance of high-throughput technologies has provided many resources for heritability estimation. Linkage disequilibrium (LD) score regression (LDSC) estimates both heritability and confounding biases, such as cryptic relatedness and population stratification, among single-nucleotide polymorphisms (SNPs) by using only summary statistics released from genome-wide association studies. However, only partial information in the LD matrix is utilized in LDSC, leading to loss in precision. In this study, we propose LD eigenvalue regression (LDER), an extension of LDSC, by making full use of the LD information. Compared to state-of-the-art heritability estimating methods, LDER provides more accurate estimates of SNP heritability and better distinguishes the inflation caused by polygenicity and confounding effects. We demonstrate the advantages of LDER both theoretically and with extensive simulations. We applied LDER to 814 complex traits from UK Biobank, and LDER identified 363 significantly heritable phenotypes, among which 97 were not identified by LDSC.
Full-text available
Liability to alcohol dependence (AD) is heritable, but little is known about its complex polygenic architecture or its genetic relationship with other disorders. To discover loci associated with AD and characterize the relationship between AD and other psychiatric and behavioral outcomes, we carried out the largest genome-wide association study to date of DSM-IV-diagnosed AD. Genome-wide data on 14,904 individuals with AD and 37,944 controls from 28 case-control and family-based studies were meta-analyzed, stratified by genetic ancestry (European, n = 46,568; African, n = 6,280). Independent, genome-wide significant effects of different ADH1B variants were identified in European (rs1229984; P = 9.8 × 10-13) and African ancestries (rs2066702; P = 2.2 × 10-9). Significant genetic correlations were observed with 17 phenotypes, including schizophrenia, attention deficit-hyperactivity disorder, depression, and use of cigarettes and cannabis. The genetic underpinnings of AD only partially overlap with those for alcohol consumption, underscoring the genetic distinction between pathological and nonpathological drinking behaviors.
Full-text available
Attention deficit/hyperactivity disorder (ADHD) is a highly heritable childhood behavioral disorder affecting 5% of children and 2.5% of adults. Common genetic variants contribute substantially to ADHD susceptibility, but no variants have been robustly associated with ADHD. We report a genome-wide association meta-analysis of 20,183 individuals diagnosed with ADHD and 35,191 controls that identifies variants surpassing genome-wide significance in 12 independent loci, finding important new information about the underlying biology of ADHD. Associations are enriched in evolutionarily constrained genomic regions and loss-of-function intolerant genes and around brain-expressed regulatory marks. Analyses of three replication studies: a cohort of individuals diagnosed with ADHD, a self-reported ADHD sample and a meta-analysis of quantitative measures of ADHD symptoms in the population, support these findings while highlighting study-specific differences on genetic overlap with educational attainment. Strong concordance with GWAS of quantitative population measures of ADHD symptoms supports that clinical diagnosis of ADHD is an extreme expression of continuous heritable traits. © 2018, The Author(s), under exclusive licence to Springer Nature America, Inc.
Full-text available
Major depressive disorder (MDD) is a common illness accompanied by considerable morbidity, mortality, costs, and heightened risk of suicide. We conducted a genome-wide association meta-analysis based in 135,458 cases and 344,901 controls and identified 44 independent and significant loci. The genetic findings were associated with clinical features of major depression and implicated brain regions exhibiting anatomical differences in cases. Targets of antidepressant medications and genes involved in gene splicing were enriched for smaller association signal. We found important relationships of genetic risk for major depression with educational attainment, body mass, and schizophrenia: lower educational attainment and higher body mass were putatively causal, whereas major depression and schizophrenia reflected a partly shared biological etiology. All humans carry lesser or greater numbers of genetic risk factors for major depression. These findings help refine the basis of major depression and imply that a continuous measure of risk underlies the clinical phenotype.
Full-text available
Background Low-risk limits recommended for alcohol consumption vary substantially across different national guidelines. To define thresholds associated with lowest risk for all-cause mortality and cardiovascular disease, we studied individual-participant data from 599 912 current drinkers without previous cardiovascular disease. Methods We did a combined analysis of individual-participant data from three large-scale data sources in 19 high-income countries (the Emerging Risk Factors Collaboration, EPIC-CVD, and the UK Biobank). We characterised dose–response associations and calculated hazard ratios (HRs) per 100 g per week of alcohol (12·5 units per week) across 83 prospective studies, adjusting at least for study or centre, age, sex, smoking, and diabetes. To be eligible for the analysis, participants had to have information recorded about their alcohol consumption amount and status (ie, non-drinker vs current drinker), plus age, sex, history of diabetes and smoking status, at least 1 year of follow-up after baseline, and no baseline history of cardiovascular disease. The main analyses focused on current drinkers, whose baseline alcohol consumption was categorised into eight predefined groups according to the amount in grams consumed per week. We assessed alcohol consumption in relation to all-cause mortality, total cardiovascular disease, and several cardiovascular disease subtypes. We corrected HRs for estimated long-term variability in alcohol consumption using 152 640 serial alcohol assessments obtained some years apart (median interval 5·6 years [5th–95th percentile 1·04–13·5]) from 71 011 participants from 37 studies. Findings In the 599 912 current drinkers included in the analysis, we recorded 40 310 deaths and 39 018 incident cardiovascular disease events during 5·4 million person-years of follow-up. For all-cause mortality, we recorded a positive and curvilinear association with the level of alcohol consumption, with the minimum mortality risk around or below 100 g per week. Alcohol consumption was roughly linearly associated with a higher risk of stroke (HR per 100 g per week higher consumption 1·14, 95% CI, 1·10–1·17), coronary disease excluding myocardial infarction (1·06, 1·00–1·11), heart failure (1·09, 1·03–1·15), fatal hypertensive disease (1·24, 1·15–1·33); and fatal aortic aneurysm (1·15, 1·03–1·28). By contrast, increased alcohol consumption was log-linearly associated with a lower risk of myocardial infarction (HR 0·94, 0·91–0·97). In comparison to those who reported drinking >0–≤100 g per week, those who reported drinking >100–≤200 g per week, >200–≤350 g per week, or >350 g per week had lower life expectancy at age 40 years of approximately 6 months, 1–2 years, or 4–5 years, respectively. Interpretation In current drinkers of alcohol in high-income countries, the threshold for lowest risk of all-cause mortality was about 100 g/week. For cardiovascular disease subtypes other than myocardial infarction, there were no clear risk thresholds below which lower alcohol consumption stopped being associated with lower disease risk. These data support limits for alcohol consumption that are lower than those recommended in most current guidelines. Funding UK Medical Research Council, British Heart Foundation, National Institute for Health Research, European Union Framework 7, and European Research Council.
Full-text available
We introduce an approach to identify disease-relevant tissues and cell types by analyzing gene expression data together with genome-wide association study (GWAS) summary statistics. Our approach uses stratified linkage disequilibrium (LD) score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We applied our approach to gene expression data from several sources together with GWAS summary statistics for 48 diseases and traits (average N = 169,331) and found significant tissue-specific enrichments (false discovery rate (FDR) < 5%) for 34 traits. In our analysis of multiple tissues, we detected a broad range of enrichments that recapitulated known biology. In our brain-specific analysis, significant enrichments included an enrichment of inhibitory over excitatory neurons for bipolar disorder, and excitatory over inhibitory neurons for schizophrenia and body mass index. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signals.
Full-text available
A main challenge in genome-wide association studies (GWAS) is to pinpoint possible causal variants. Results from GWAS typically do not directly translate into causal variants because the majority of hits are in non-coding or intergenic regions, and the presence of linkage disequilibrium leads to effects being statistically spread out across multiple variants. Post-GWAS annotation facilitates the selection of most likely causal variant(s). Multiple resources are available for post-GWAS annotation, yet these can be time consuming and do not provide integrated visual aids for data interpretation. We, therefore, develop FUMA: an integrative web-based platform using information from multiple biological resources to facilitate functional annotation of GWAS results, gene prioritization and interactive visualization. FUMA accommodates positional, expression quantitative trait loci (eQTL) and chromatin interaction mappings, and provides gene-based, pathway and tissue enrichment results. FUMA results directly aid in generating hypotheses that are testable in functional experiments aimed at proving causal relations.
Attention deficit/hyperactivity disorder (ADHD) is a highly heritable childhood behavioral disorder affecting 5% of children and 2.5% of adults. Common genetic variants contribute substantially to ADHD susceptibility, but no variants have been robustly associated with ADHD. We report a genome-wide association meta-analysis of 20,183 individuals diagnosed with ADHD and 35,191 controls that identifies variants surpassing genome-wide significance in 12 independent loci, finding important new information about the underlying biology of ADHD. Associations are enriched in evolutionarily constrained genomic regions and loss-of-function intolerant genes and around brain-expressed regulatory marks. Analyses of three replication studies: a cohort of individuals diagnosed with ADHD, a self-reported ADHD sample and a meta-analysis of quantitative measures of ADHD symptoms in the population, support these findings while highlighting study-specific differences on genetic overlap with educational attainment. Strong concordance with GWAS of quantitative population measures of ADHD symptoms supports that clinical diagnosis of ADHD is an extreme expression of continuous heritable traits.
Objective:: Alcohol use disorders are common conditions that have enormous social and economic consequences. Genome-wide association analyses were performed to identify genetic variants associated with a proxy measure of alcohol consumption and alcohol misuse and to explore the shared genetic basis between these measures and other substance use, psychiatric, and behavioral traits. Method:: This study used quantitative measures from the Alcohol Use Disorders Identification Test (AUDIT) from two population-based cohorts of European ancestry (UK Biobank [N=121,604] and 23andMe [N=20,328]) and performed a genome-wide association study (GWAS) meta-analysis. Two additional GWAS analyses were performed, a GWAS for AUDIT scores on items 1-3, which focus on consumption (AUDIT-C), and for scores on items 4-10, which focus on the problematic consequences of drinking (AUDIT-P). Results:: The GWAS meta-analysis of AUDIT total score identified 10 associated risk loci. Novel associations localized to genes including JCAD and SLC39A13; this study also replicated previously identified signals in the genes ADH1B, ADH1C, KLB, and GCKR. The dimensions of AUDIT showed positive genetic correlations with alcohol consumption (rg=0.76-0.92) and DSM-IV alcohol dependence (rg=0.33-0.63). AUDIT-P and AUDIT-C scores showed significantly different patterns of association across a number of traits, including psychiatric disorders. AUDIT-P score was significantly positively genetically correlated with schizophrenia (rg=0.22), major depressive disorder (rg=0.26), and attention deficit hyperactivity disorder (rg=0.23), whereas AUDIT-C score was significantly negatively genetically correlated with major depressive disorder (rg=-0.24) and ADHD (rg=-0.10). This study also used the AUDIT data in the UK Biobank to identify thresholds for dichotomizing AUDIT total score that optimize genetic correlations with DSM-IV alcohol dependence. Coding individuals with AUDIT total scores ≤4 as control subjects and those with scores ≥12 as case subjects produced a significant high genetic correlation with DSM-IV alcohol dependence (rg=0.82) while retaining most subjects. Conclusions:: AUDIT scores ascertained in population-based cohorts can be used to explore the genetic basis of both alcohol consumption and alcohol use disorders.
Background and Aims Longitudinal electronic health record (EHR) data offer a large‐scale, untapped source of phenotypic information on harmful alcohol use. Using established, alcohol‐associated variants in the gene that encodes the enzyme alcohol dehydrogenase 1B (ADH1B) as criterion standards, we compared the individual and combined validity of three longitudinal EHR‐based phenotypes of harmful alcohol use: Alcohol Use Disorders Identification Test‐Consumption (AUDIT‐C) trajectories; mean age‐adjusted AUDIT‐C; and diagnoses of Alcohol Use Disorder (AUD). Design With longitudinal EHR data from the Million Veteran Program (MVP) linked to genetic data, we used two population‐specific polymorphisms in ADH1B that are strongly associated with AUD in African Americans (AAs) and European Americans (EAs): rs2066702 (Arg369Cys, AAs) and rs1229984 (Arg48His, EAs) as criterion measures. Setting United States Department of Veterans Affairs Healthcare System
Background: Alcohol use (both quantity and dependence) is moderately heritable, and genomewide association studies (GWAS) have identified risk genes in European, African, and Asian populations. The most reproducibly identified risk genes affect alcohol metabolism. Well-known functional variants at the gene encoding alcohol dehydrogenase B and other alcohol dehydrogenases affect risk in European and African ancestry populations. Similarly, variants mapped to these same genes and a well-known null variant that maps to the gene that encodes aldehyde dehydrogenase 2 (ALDH2) also affect risk in various Asian populations. In this study, we completed the first GWAS for 3 traits related to alcohol use in a Thai population recruited initially for studies of methamphetamine dependence. Methods: All subjects were evaluated with the Thai version of the Semi-Structured Assessment for Drug Dependence and Alcoholism (SSADDA). A total of 1,045 subjects were available for analysis. Three traits were analyzed: flushing, maximum number of alcoholic beverages consumed in any lifetime 24-hour period ("MAXDRINKS"), and DSM-IV alcohol dependence criterion count. We also conducted a pleiotropy analysis with major depression, the only other psychiatric trait where summary statistics from a large-scale Asian-population GWAS are available. Results: All 3 traits showed genomewide significant association with variants near ALDH2, with significance ranging from 2.01 × 10-14 (for flushing; lead single nucleotide polymorphism (SNP) PTPN11* rs143894582) to pmeta = 5.80 × 10-10 (for alcohol dependence criterion count; lead SNP rs149212747). These lead SNPs flank rs671 and span a region of over a megabase, illustrating the need for prior biological information in identifying the actual effect SNP, rs671. We also identified significant pleiotropy between major depression and flushing. Conclusions: These results are consistent with prior findings in Asian populations and add new information regarding alcohol use-depression pleiotropy.