ArticlePublisher preview available

Multi-ancestry genetic study of type 2 diabetes highlights the power of diverse populations for discovery and translation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract and Figures

We assembled an ancestrally diverse collection of genome-wide association studies (GWAS) of type 2 diabetes (T2D) in 180,834 affected individuals and 1,159,055 controls (48.9% non-European descent) through the Diabetes Meta-Analysis of Trans-Ethnic association studies (DIAMANTE) Consortium. Multi-ancestry GWAS meta-analysis identified 237 loci attaining stringent genome-wide significance (P < 5 × 10⁻⁹), which were delineated to 338 distinct association signals. Fine-mapping of these signals was enhanced by the increased sample size and expanded population diversity of the multi-ancestry meta-analysis, which localized 54.4% of T2D associations to a single variant with >50% posterior probability. This improved fine-mapping enabled systematic assessment of candidate causal genes and molecular mechanisms through which T2D associations are mediated, laying the foundations for functional investigations. Multi-ancestry genetic risk scores enhanced transferability of T2D prediction across diverse populations. Our study provides a step toward more effective clinical translation of T2D GWAS to improve global health for all, irrespective of genetic background.
Comparison of fine-mapping resolution for distinct association signals for T2D obtained from ancestry-specific meta-analysis and multi-ancestry meta-regression a, Each point corresponds to a distinct association signal, plotted according to the log10 credible set size in the multi-ancestry meta-regression on the x axis and the log10 credible set size in the European ancestry meta-analysis on the y axis. The 266 (78.7%) signals above the dashed y = x line were more precisely fine-mapped in the multi-ancestry meta-regression. b, We ‘downsampled’ the multi-ancestry meta-regression to the effective sample size of the European ancestry-specific meta-analysis. Each point corresponds to one of the 266 signals that were more precisely fine-mapped in the multi-ancestry meta-regression. The 137 (51.5%) signals above the dashed y = x line were more precisely fine-mapped in the ‘downsampled’ multi-ancestry meta-regression than in the equivalently sized European ancestry-specific meta-analysis. c, Properties of 99% credible sets of variants driving each distinct association signal in European (EUR) ancestry-specific meta-analysis, combined East Asian (EAS) and European ancestry meta-analysis and multi-ancestry meta-regression. The inclusion of the most under-represented ancestry groups (African, Hispanic and South Asian) in the multi-ancestry meta-regression reduced the median size of 99% credible sets and increased the median posterior probability (PP) ascribed to index SNVs.
… 
T2D-association signal at the BCAR1 locus colocalizes with multiple circulating plasma pQTL a, Signal plot for T2D association from multi-ancestry meta-regression of 180,834 affected individuals and 1,159,055 controls of diverse ancestry. Each point represents an SNV, plotted with its P value (on a log10 scale) as a function of genomic position (National Center for Biotechnology Information (NCBI) build 37). Gene annotations were taken from the University of California Santa Cruz genome browser. Recombination rates were estimated from the Phase II HapMap. Chr, chromosome. b, Fine-mapping of T2D-association signals from multi-ancestry meta-regression. Each point represents an SNV plotted with its posterior probability of driving T2D association as a function of genomic position (NCBI build 37). Chromatin states are presented for four diabetes-relevant tissues: active transcription start sites (TSS) (red), flanking active TSS (orange–red), strong transcription (green), weak transcription (dark green), genic enhancers (green–yellow), active enhancers (orange), weak enhancers (yellow), bivalent or poised TSS (Indian red), flanking bivalent TSS or enhancer (dark salmon), repressed polycomb (silver), weak repressed polycomb (gainsboro) and quiescent or low (white). c, Schematic presentation of the single cis and multiple trans effects mediated by the BCAR1 locus on plasma proteins and the islet chromatin loop between islet enhancer and promoter elements near CTRB2. d, Signal plots for four circulating plasma proteins that colocalize with the T2D association in 3,301 European ancestry participants from the INTERVAL study. Each point represents an SNV, plotted with its P value (on a log10 scale) as a function of genomic position (NCBI build 37). e, Expression of genes (TPM, transcripts per million) encoding colocalized proteins in islets, the pancreas and whole blood.
… 
Defining causal molecular mechanisms at the PROX1 locus a, Signal plot for two distinct T2D associations from multi-ancestry meta-regression of 180,834 affected individuals and 1,159,055 controls of diverse ancestry. Each point represents an SNV, plotted with its P value (on a −log10 scale) as a function of genomic position (NCBI build 37). Index SNVs are represented by blue and purple diamonds. All other SNVs are colored according to the LD with the index SNVs in European and East Asian ancestry populations. Gene annotations were taken from the University of California Santa Cruz genome browser. b, Fine-mapping of T2D-association signals from multi-ancestry meta-regression. Each point represents an SNV plotted with its posterior probability of driving each distinct T2D association as a function of genomic position (NCBI build 37). The 99% credible sets for the two signals are highlighted by purple and blue diamonds. Chromatin states are presented for four diabetes-relevant tissues: active TSS (red), flanking active TSS (orange–red), strong transcription (green), weak transcription (dark green), genic enhancers (green–yellow), active enhancers (orange), weak enhancers (yellow), bivalent or poised TSS (Indian red), flanking bivalent TSS or enhancer (dark salmon), repressed polycomb (silver), weak repressed polycomb (gainsboro), quiescent or low (white). c, Transcriptional activity of the 99 credible set variants at the two T2D-association signals in human HepG2 hepatocytes and EndoC-βH1 beta cell models obtained from in vitro reporter assays. Biological replicates, n = 3; technical replicates, n = 3. WT, wild type (non-risk allele or haplotype); GFP, green fluorescent protein (negative control); EV, empty vector (baseline). Heights of bars represent means. Error bars represent s.e.m. Differences in luciferase activity between groups were tested using two-tailed two-sample t-tests, for which P < 0.05 was considered statistically significant. d, Expression of PROX1 across a range of diabetes-relevant tissues. Source data
… 
This content is subject to copyright. Terms and conditions apply.
Articles
https://doi.org/10.1038/s41588-022-01058-3
A full list of author and affiliations appears at the end of the paper.
The global prevalence of T2D has quadrupled over the last 30
years1, affecting approximately 392 million individuals in 2015
(ref. 2). Despite this worldwide impact, the largest T2D GWAS
have predominantly featured populations of European ancestry36,
compromising prospects for clinical translation. Failure to detect
causal variants that contribute to disease risk outside European
ancestry populations limits progress toward a full understanding
of disease biology and constrains opportunities for development of
therapeutics7. Implementation of personalized approaches to dis-
ease management depends on accurate prediction of individual risk,
irrespective of ancestry. However, genetic risk scores (GRS) derived
from European ancestry GWAS provide unreliable prediction when
deployed in other population groups, in part reflecting differences
in effect sizes, allele frequencies and patterns of linkage disequilib-
rium (LD)8.
To address the impact of this population bias, recent T2D
GWAS have included individuals of non-European ancestry911.
The DIAMANTE Consortium was established to assemble T2D
GWAS across diverse ancestry groups. Analyses of the European
and East Asian ancestry components of the DIAMANTE study
have previously been reported6,10. Here, we describe the results of
our multi-ancestry meta-analysis, which expands on these pub-
lished components to a total of 180,834 individuals with T2D and
1,159,055 controls, with 20.5% of the effective sample size ascer-
tained from African, Hispanic and South Asian ancestry groups.
With these data, we demonstrate the value of analyses conducted
on diverse populations to understand how T2D-associated variants
impact downstream molecular and biological processes underlying
the disease and advance clinical translation of GWAS findings for
all, irrespective of genetic background.
Results
Study overview. We accumulated association summary statistics
from 122 GWAS for 180,834 individuals with T2D and 1,159,055
controls (effective sample size, 492,191) across five ancestry groups
(Supplementary Tables 1–3). We use the term ‘ancestry group’ to refer
to individuals with similar genetic background: European ancestry
(51.1% of the total effective sample size); East Asian ancestry (28.4%);
South Asian ancestry (8.3%); African ancestry, including recently
admixed African American populations (6.6%); and Hispanic indi-
viduals with recent admixture of American, African and European
ancestry (5.6%). Each ancestry-specific GWAS was imputed to ref-
erence panels from the 1000 Genomes Project12,13, the Haplotype
Reference Consortium14 or population-specific whole-genome
sequence data. Subsequent association analyses were adjusted for
population structure and relatedness (Supplementary Table 4). We
considered 19,829,461 biallelic autosomal single-nucleotide vari-
ants (SNVs) that overlapped reference panels with minor allele fre-
quency >0.5% in at least one of the five ancestry groups (Extended
Data Fig. 1 and Methods).
Robust discovery of multi-ancestry T2D associations. We
aggregated association summary statistics via multi-ancestry
meta-regression, implemented in MR-MEGA15, which models
allelic effect heterogeneity correlated with genetic ancestry. We
included three axes of genetic variation as covariates that separated
genome-wide associations from the five major ancestry groups
(Extended Data Fig. 2 and Methods). We identified 277 loci asso-
ciated with T2D at the conventional genome-wide significance
threshold of P < 5 × 108 (Extended Data Fig. 3 and Supplementary
Table 5). By accounting for ancestry-correlated allelic effect hetero-
geneity in the multi-ancestry meta-regression, we observed lower
genomic control inflation (λGC = 1.05) than when using either fixed-
or random-effects meta-analysis (λGC = 1.25 under both models) and
stronger signals of association at lead SNVs at most loci (Extended
Data Fig. 4). Of the 277 loci, 11 have not previously been reported
in recently published T2D GWAS meta-analyses6,10,11 that account
for 78.6% of the total effective sample size of this multi-ancestry
meta-regression (Extended Data Fig. 3 and Supplementary
Note). Of the 100 and 193 loci attaining genome-wide signifi-
cance (P < 5 × 108) in East Asian and European ancestry-specific
meta-analyses, respectively, lead SNVs at 94 (94.0%) and 164
(85.0%) demonstrated stronger evidence for association (smaller P
values) in the multi-ancestry meta-regression (Extended Data Fig.
5 and Supplementary Note). These results demonstrate the power
of multi-ancestry meta-analyses for locus discovery afforded by
Multi-ancestry genetic study of type 2 diabetes
highlights the power of diverse populations for
discovery and translation
We assembled an ancestrally diverse collection of genome-wide association studies (GWAS) of type 2 diabetes (T2D) in
180,834 affected individuals and 1,159,055 controls (48.9% non-European descent) through the Diabetes Meta-Analysis of
Trans-Ethnic association studies (DIAMANTE) Consortium. Multi-ancestry GWAS meta-analysis identified 237 loci attaining
stringent genome-wide significance (P< 5 × 109), which were delineated to 338 distinct association signals. Fine-mapping of
these signals was enhanced by the increased sample size and expanded population diversity of the multi-ancestry meta-analysis,
which localized 54.4% of T2D associations to a single variant with >50% posterior probability. This improved fine-mapping
enabled systematic assessment of candidate causal genes and molecular mechanisms through which T2D associations are
mediated, laying the foundations for functional investigations. Multi-ancestry genetic risk scores enhanced transferability of
T2D prediction across diverse populations. Our study provides a step toward more effective clinical translation of T2D GWAS
to improve global health for all, irrespective of genetic background.
NATURE GENETICS | VOL 54 | MAY 2022 | 560–572 | www.nature.com/naturegenetics
560
Content courtesy of Springer Nature, terms of use apply. Rights reserved
Article
Introduction Non-coding genetic variation at TCF7L2 is the strongest genetic determinant of type 2 diabetes (T2D) risk in humans. TCF7L2 encodes a transcription factor mediating the nuclear effects of WNT signaling in adipose tissue (AT). In vivo studies in transgenic mice have highlighted important roles for TCF7L2 in adipose tissue biology and systemic metabolism. Objective To map the expression of TCF7L2 in human AT, examine its role in human adipose cell biology in vitro, and investigate the effects of the fine-mapped T2D-risk allele at rs7903146 on AT morphology and TCF7L2 expression. Methods Ex vivo gene expression studies of TCF7L2 in whole and fractionated human AT. In vitro TCF7L2 gain- and/or loss-of-function studies in primary and immortalized human adipose progenitor cells (APCs) and mature adipocytes (mADs). AT phenotyping of rs7903146 T2D-risk variant carriers and matched controls. Results Adipose progenitors (APs) exhibited the highest TCF7L2 mRNA abundance compared to mature adipocytes and adipose-derived endothelial cells. Obesity was associated with reduced TCF7L2 transcript levels in whole subcutaneous abdominal AT but paradoxically increased expression in APs. In functional studies, TCF7L2 knockdown (KD) in abdominal APs led to dose-dependent activation of WNT/β-catenin signaling, impaired proliferation and dose-dependent effects on adipogenesis. Whilst partial KD enhanced adipocyte differentiation, near-total KD impaired lipid accumulation and adipogenic gene expression. Over-expression of TCF7L2 accelerated adipogenesis. In contrast, TCF7L2-KD in gluteal APs dose-dependently enhanced lipid accumulation. Transcriptome-wide profiling revealed that TCF7L2 might modulate multiple aspects of AP biology including extracellular matrix secretion, immune signaling and apoptosis. The T2D-risk allele at rs7903146 was associated with reduced AP TCF7L2 expression and enhanced AT insulin sensitivity. Conclusions TCF7L2 plays a complex role in AP biology and has both dose- and depot-dependent effects on adipogenesis. In addition to regulating pancreatic insulin secretion, genetic variation at TCF7L2 might also influence T2D risk by modulating AP function.
Article
Full-text available
Background Type 2 diabetes (T2D) is highly prevalent in British South Asians, yet they are underrepresented in research. Genes & Health (G&H) is a large, population study of British Pakistanis and Bangladeshis (BPB) comprising genomic and routine health data. We assessed the extent to which genetic risk for T2D is shared between BPB and European populations (EUR). We then investigated whether the integration of a polygenic risk score (PRS) for T2D with an existing risk tool (QDiabetes) could improve prediction of incident disease and the characterisation of disease subtypes. Methods and findings In this observational cohort study, we assessed whether common genetic loci associated with T2D in EUR individuals were replicated in 22,490 BPB individuals in G&H. We replicated fewer loci in G&H ( n = 76/338, 22%) than would be expected given power if all EUR-ascertained loci were transferable ( n = 101, 30%; p = 0.001). Of the 27 transferable loci that were powered to interrogate this, only 9 showed evidence of shared causal variants. We constructed a T2D PRS and combined it with a clinical risk instrument (QDiabetes) in a novel, integrated risk tool (IRT) to assess risk of incident diabetes. To assess model performance, we compared categorical net reclassification index (NRI) versus QDiabetes alone. In 13,648 patients free from T2D followed up for 10 years, NRI was 3.2% for IRT versus QDiabetes (95% confidence interval (CI): 2.0% to 4.4%). IRT performed best in reclassification of individuals aged less than 40 years deemed low risk by QDiabetes alone (NRI 5.6%, 95% CI 3.6% to 7.6%), who tended to be free from comorbidities and slim. After adjustment for QDiabetes score, PRS was independently associated with progression to T2D after gestational diabetes (hazard ratio (HR) per SD of PRS 1.23, 95% CI 1.05 to 1.42, p = 0.028). Using cluster analysis of clinical features at diabetes diagnosis, we replicated previously reported disease subgroups, including Mild Age-Related, Mild Obesity-related, and Insulin-Resistant Diabetes, and showed that PRS distribution differs between subgroups ( p = 0.002). Integrating PRS in this cluster analysis revealed a Probable Severe Insulin Deficient Diabetes (pSIDD) subgroup, despite the absence of clinical measures of insulin secretion or resistance. We also observed differences in rates of progression to micro- and macrovascular complications between subgroups after adjustment for confounders. Study limitations include the absence of an external replication cohort and the potential biases arising from missing or incorrect routine health data. Conclusions Our analysis of the transferability of T2D loci between EUR and BPB indicates the need for larger, multiancestry studies to better characterise the genetic contribution to disease and its varied aetiology. We show that a T2D PRS optimised for this high-risk BPB population has potential clinical application in BPB, improving the identification of T2D risk (especially in the young) on top of an established clinical risk algorithm and aiding identification of subgroups at diagnosis, which may help future efforts to stratify care and treatment of the disease.
Preprint
Full-text available
Despite the great success of genome-wide association studies (GWAS) in identifying genetic loci significantly associated with diseases, the vast majority of causal variants underlying disease-associated loci have not been identified. To create an atlas of causal variants, we performed and integrated fine-mapping across 148 complex traits in three large-scale biobanks (BioBank Japan, FinnGen, and UK Biobank; total n = 811,261), resulting in 4,518 variant-trait pairs with high posterior probability (> 0.9) of causality. Of these, we found 285 high-confidence variant-trait pairs replicated across multiple populations, and we characterized multiple contributors to the surprising lack of overlap among fine-mapping results from different biobanks. By studying the bottlenecked Finnish and Japanese populations, we identified 21 and 26 putative causal coding variants with extreme allele frequency enrichment (> 10-fold) in these two populations, respectively. Aggregating data across populations enabled identification of 1,492 unique fine-mapped coding variants and 176 genes in which multiple independent coding variants influence the same trait ( i.e. , with an allelic series of coding variants). Our results demonstrate that fine-mapping in diverse populations enables novel insights into the biology of complex traits by pinpointing high-confidence causal variants for further characterization.
Article
Full-text available
Single-nucleus assay for transposase-accessible chromatin using sequencing (snATAC-seq) creates new opportunities to dissect cell type–specific mechanisms of complex diseases. Since pancreatic islets are central to type 2 diabetes (T2D), we profiled 15,298 islet cells by using combinatorial barcoding snATAC-seq and identified 12 clusters, including multiple alpha, beta and delta cell states. We cataloged 228,873 accessible chromatin sites and identified transcription factors underlying lineage- and state-specific regulation. We observed state-specific enrichment of fasting glucose and T2D genome-wide association studies for beta cells and enrichment for other endocrine cell types. At T2D signals localized to islet-accessible chromatin, we prioritized variants with predicted regulatory function and co-accessibility with target genes. A causal T2D variant rs231361 at the KCNQ1 locus had predicted effects on a beta cell enhancer co-accessible with INS and genome editing in embryonic stem cell–derived beta cells affected INS levels. Together our findings demonstrate the power of single-cell epigenomics for interpreting complex disease genetics. Single-cell ATAC-seq analysis of human pancreatic islet cells identifies different cell clusters and transcription factors that underlie lineage- and state-specific regulation and helps prioritize type 2 diabetes risk variants.
Article
Full-text available
The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes) ¹ . In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
Article
Full-text available
Most signals detected by genome-wide association studies map to non-coding sequence and their tissue-specific effects influence transcriptional regulation. However, key tissues and cell-types required for functional inference are absent from large-scale resources. Here we explore the relationship between genetic variants influencing predisposition to type 2 diabetes (T2D) and related glycemic traits, and human pancreatic islet transcription using data from 420 donors. We find: (a) 7741 cis-eQTLs in islets with a replication rate across 44 GTEx tissues between 40% and 73%; (b) marked overlap between islet cis-eQTL signals and active regulatory sequences in islets, with reduced eQTL effect size observed in the stretch enhancers most strongly implicated in GWAS signal location; (c) enrichment of islet cis-eQTL signals with T2D risk variants identified in genome-wide association studies; and (d) colocalization between 47 islet cis-eQTLs and variants influencing T2D or glycemic traits, including DGKB and TCF7L2. Our findings illustrate the advantages of performing functional and regulatory studies in disease relevant tissues. Mechanistic inference following GWAS is hampered by the lack of tissue-specific transcriptomic resources. Here the authors combine genetic variants predisposing to type 2 diabetes with human pancreatic islet RNA-seq data. They identify 7741 islet expression quantitative trait loci (eQTLs), providing a resource for functional interpretation of association signals mapping to non-coding sequence.
Article
Full-text available
We investigated type 2 diabetes (T2D) genetic susceptibility via multi-ancestry meta-analysis of 228,499 cases and 1,178,783 controls in the Million Veteran Program (MVP), DIAMANTE, Biobank Japan and other studies. We report 568 associations, including 286 autosomal, 7 X-chromosomal and 25 identified in ancestry-specific analyses that were previously unreported. Transcriptome-wide association analysis detected 3,568 T2D associations with genetically predicted gene expression in 687 novel genes; of these, 54 are known to interact with FDA-approved drugs. A polygenic risk score (PRS) was strongly associated with increased risk of T2D-related retinopathy and modestly associated with chronic kidney disease (CKD), peripheral artery disease (PAD) and neuropathy. We investigated the genetic etiology of T2D-related vascular outcomes in the MVP and observed statistical SNP–T2D interactions at 13 variants, including coronary heart disease (CHD), CKD, PAD and neuropathy. These findings may help to identify potential therapeutic targets for T2D and genomic pathways that link T2D to vascular outcomes. Genome-wide association meta-analyses among 1.4 million individuals identify 318 new risk loci for type 2 diabetes and provide insight into the contribution of these risk variants to diabetes-related vascular outcomes.
Article
Full-text available
Meta-analyses of genome-wide association studies (GWAS) have identified more than 240 loci that are associated with type 2 diabetes (T2D)1,2; however, most of these loci have been identified in analyses of individuals with European ancestry. Here, to examine T2D risk in East Asian individuals, we carried out a meta-analysis of GWAS data from 77,418 individuals with T2D and 356,122 healthy control individuals. In the main analysis, we identified 301 distinct association signals at 183 loci, and across T2D association models with and without consideration of body mass index and sex, we identified 61 loci that are newly implicated in predisposition to T2D. Common variants associated with T2D in both East Asian and European populations exhibited strongly correlated effect sizes. Previously undescribed associations include signals in or near GDAP1, PTF1A, SIX3, ALDH2, a microRNA cluster, and genes that affect the differentiation of muscle and adipose cells3. At another locus, expression quantitative trait loci at two overlapping T2D signals affect two genes—NKX6-3 and ANK1—in different tissues4–6. Association studies in diverse populations identify additional loci and elucidate disease-associated genes, biology, and pathways. A meta-analysis of genome-wide association study data from 77,418 individuals of East Asian ancestry with type 2 diabetes identifies novel variants associated with increased risk of type 2 diabetes.
Article
Full-text available
Polygenic risk scores (PRSs) have shown promise in predicting susceptibility to common diseases1–3. We estimated their added value in clinical risk prediction of five common diseases, using large-scale biobank data (FinnGen; n = 135,300) and the FINRISK study with clinical risk factors to test genome-wide PRSs for coronary heart disease, type 2 diabetes, atrial fibrillation, breast cancer and prostate cancer. We evaluated the lifetime risk at different PRS levels, and the impact on disease onset and on prediction together with clinical risk scores. Compared to having an average PRS, having a high PRS contributed 21% to 38% higher lifetime risk, and 4 to 9 years earlier disease onset. PRSs improved model discrimination over age and sex in type 2 diabetes, atrial fibrillation, breast cancer and prostate cancer, and over clinical risk in type 2 diabetes, breast cancer and prostate cancer. In all diseases, PRSs improved reclassification over clinical thresholds, with the largest net reclassification improvements for early-onset coronary heart disease, atrial fibrillation and prostate cancer. This study provides evidence for the additional value of PRSs in clinical disease prediction. The practical applications of polygenic risk information for stratified screening or for guiding lifestyle and medical interventions in the clinical setting remain to be defined in further studies. In a large and prospective cohort, higher polygenic risk is associated with higher risk and earlier age of onset for cardiometabolic disorders and cancer, and has added value to clinical risk scores in clinical disease prediction.
Article
Full-text available
Genome-wide association analyses have uncovered multiple genomic regions associated with T2D, but identification of the causal variants at these remains a challenge. There is growing interest in the potential of deep learning models - which predict epigenome features from DNA sequence - to support inference concerning the regulatory effects of disease-associated variants. Here, we evaluate the advantages of training convolutional neural network (CNN) models on a broad set of epigenomic features collected in a single disease-relevant tissue – pancreatic islets in the case of type 2 diabetes (T2D) - as opposed to models trained on multiple human tissues. We report convergence of CNN-based metrics of regulatory function with conventional approaches to variant prioritization – genetic fine-mapping and regulatory annotation enrichment. We demonstrate that CNN-based analyses can refine association signals at T2D-associated loci and provide experimental validation for one such signal. We anticipate that these approaches will become routine in downstream analyses of GWAS.
Article
We must embrace a multidimensional, continuous view of ancestry and move away from continental ancestry categories.
Article
Most loci identified by GWASs have been found in populations of European ancestry (EUR). In trans-ethnic meta-analyses for 15 hematological traits in 746,667 participants, including 184,535 non-EUR individuals, we identified 5,552 trait-variant associations at p < 5 × 10-9, including 71 novel associations not found in EUR populations. We also identified 28 additional novel variants in ancestry-specific, non-EUR meta-analyses, including an IL7 missense variant in South Asians associated with lymphocyte count in vivo and IL-7 secretion levels in vitro. Fine-mapping prioritized variants annotated as functional and generated 95% credible sets that were 30% smaller when using the trans-ethnic as opposed to the EUR-only results. We explored the clinical significance and predictive value of trans-ethnic variants in multiple populations and compared genetic architecture and the effect of natural selection on these blood phenotypes between populations. Altogether, our results for hematological traits highlight the value of a more global representation of populations in genetic studies.