Conference PaperPDF Available

Finding the Missing Heritability in GWAS: U-statistics for Genetically Structured Data in Subgroup Analyses of Epilepsy, Autism and Migraines

Authors:

Abstract

Background: A decade after the Human Genome Project, the clinical advances hoped for from genome-wide association studies (GWAS) have not yet been realized. Enlarging sample sizes to tens or hundreds of thousands limits the questions that can be addressed, but does not guard against non-functional SNPs differing between non-randomized populations. Combining a novel computational biostatistics approach with decision strategies fine-tuned to GWAS, we can now identify clusters of related gene in groups of hundreds of subjects only. Methods: U-statistics for genetically structured data account for linkage disequilibrium (LD), varying dominance, and compound heterozygosity within a moving window of SNPs. Replacing the conventional fixed 10E−7.5 level with a study-specific cut-off for genome-wide significance accounts for differences in minor allele frequency, the non-randomized nature of GWAS, and for conducting related tests in overlapping diplotypes. Results: The approach was validated by confirming the known targets of epilepsy drugs in a study comparing 185 childhood cases against publicly available controls. Comparing non-verbal vs verbal cases in the two independent stages of the Autism Genome project, suggested the same ion channels as being involved in disruption of active language development (DALD) as in migraines, in support of the hypothesis that preventing “stranger anxiety” from turning into migraine-like experiences might preventing a behavioral maladaptation leading to lack of language (Wittkowski KM Transl Psychiatry 2014, 4:e354). Conclusions: By reducing sample size requirements for GWAS, the genetic data collected over the last decade can now yield profound insights into the etiology of common diseases and subgroup analyses can rescue ‘failed’ phase 3 trials.
References/Acknowledgement
WXP/OSXW3OS: DOS
PC
PC
Main
Main-
-
frame
frame
GPU
GPU
Grid
Grid
Cloud
Cloud
Cluster
Cluster
19601900 1920 1940 19801800 2000 2020
U
-
Statistics
Linear Model
U-statistics are conceptually
simple and ensure
biological validitybiological validity of results.
The matrix calculationsmatrix calculations
required, however, are
computationally demanding.
µGWAS
>1GB16MB
n×n(>100 0×1000)
The strong
assumptions of
the linear model
(additivity) lead to
simple algebra
andco mputa-
tionally elegant
algorithms.
Wittkowski ’08 Hoeffding
Gehan
Mann-Whitney (U)
Kendall (τ)Deuchler
Gauss
1×1 k×k (~10x10)Matrices:
9 B 256kB0 B
Memory:
W
in
BUGS
(Bayes)
65 yr
40 yr
50 yr
SPSS
Pearson (PCA)
Fisher (F)
Student (t)
Pearson (χ
2
)BMDPFisher (F, sw Reg)
Laplace
200 yr
5
th
era
Fig. 1: Memory gains driving statistics. Gauss and
Laplace developed m odern statistical t heory around 1800,
yet statistical tests emerged only with the advent of a
mechanical calculator wi th ‘memor y’ (Ohdner, 1900:
two registers). Each subsequent advance in memory
spurred a new episode in statist ics: stepwise regressi on /
principal component analysis (mainfraimes, 1965, kB),
Bayesian methods (PCs, 1985, MB), U-statistics for
multivariate data (32bit OS, 2001, GB ).
Fig. 5: Replication of the Ras/Ca
2+
Signaling in
DALD. Darker colors indicate higher levels of signifi-
cance (cf Fig. 3). Pink circles: growth factor regulation,
green circles: ion cha nnels, red boxes: genes known to
cause familal hemiplegic migrain es.
Fig 8: Genetic Risk Factors for Hyperexcitability
Shared Among Migraines and DALD. Further
support for shared risk factors between migraines and
DALD comes from the th ree FHM genes al so being
associated with movement disorders (cerebellar ataxias,
SCA6/EA2), severemyoclonicepilepsy in infancy (SMEI),
and alternating hemiplegia of childhood (AHC), respec-
tively, all diseases where infant ons et is observed.
Fig. 7: Enrichment of most si gnificant genes with
genes involv ed in Ras /Ca
2+
horizontal blue line: study-
specific cut-off (Fig. 3); top pan els: µGWAS, bottom
panels: ssGWAS.
Fig. 2: Genetic Risk Mod el. Neighboring SNPs are
assumed to be in LD with intermediate disease loci,
unless separated by a recombinationhotspot(×). The
effects of neighboring SNPs can be d ependent.
In contrast to single-SNP GWAS (ssGWAS) and
most GWAS based on linear weight (“allelic”,
lwGWAS) or linear/logistic regression (lrGWAS),
µGWAS avoids artifacts by accounting for LD
structure and varying levels of dominance (Fig. 2)
withinthemethod,therebyalsoincreasing power.
Fig. 5: Study-Specific GW Significance. Th e cut-off
(solid bar) is estimated as th e median projection from th e
chromosomes with the lowest deviation from th e
projection (dashed curves/bars ). (partial results shown)
Cut-offs for ‘genome-wide significance’ should de-
pend on study size, disease complexity, and data
quality.
(Fisher 1958)
Moreover, The expected distri-
bution in a GWAS QQ plot is not a straight line.
Calculating study-specific genome-wide signifi-
cance accounts for (a) the most significant results
to require a MAF .50, even in ssGWAS, and
(b)
many dependent tests performed within and
between sliding windows (Fig. 5).
Finding the Missing Heritability in GWAS: U-statistics for Genetically Structured Data
in Subgroup Analyses of Epilepsy, Autism and Migraines
KM
Wittkowski,
1)kmw@rockefeller.edu
EEising,
2)
CDadurian,
1)
BBigio,
1)408.06/J19Mo nPM
GMTerwindt,
2)
MDFerrari,
2)
AMJMvandenMaagdenberg
2),3)
1)
The Rockefeller Univ. Hospital, Biostatistics, Epidemiology & Research Design,
2)
Leiden Univ. Med. Ctr.,
3)
Internat. Headache Consortium
1. Wi ttkowski KM, et al. (2013) From single-SNP to wi de-locus:
genome-wide association studies identifying f unctionally related
genes and intragenic regions in small sample studies.
Pharmacogenomics. 14(4):391.
Available from: http://www.ncbi.nlm.nih.gov/pubmed/23438886
2. Wi ttkowski KM, et al. (2014) A Novel Computational Biostatistics
Approach Implies I mpaired De phosphorylation of Growth Factor
Receptors As Associated With Severity of Autism.
Translational Psychiatry. 4:e354.
Available from: http://www.nature.com/articles/tp2013124.
Fig. 3: Childhood Absence Epilepsy (CAE) in 185
children vs. 370 Illumina controls. Darker colors indicate
higher levels of significance.
In a phase 3 trial, only 400 of 600 treated sub-
jects responded to the study drug with respect
to either of two outcomes. While ssGWAS
results were inconclusive (Fig. 10, only outcome
I shown), muGWAS implicated mutations along
a pathway not targeted by the drug (Fig. 11).
1
2
3
4
5
6
A: Outcome 1
Discussion
(Phase
3
Subgroups)
Fig. 4: High enrichment of µGWAS vs ssGWAS.
Some non-resp onders hav e a m utation in the drug tar get
(left), but most have mutations in genes al ong a different
pathway (right) and, thus, need a different drug.
Supported in part by grant # UL1 TR000043 from the National Center for Advan cing
Translational Scien ces (N CATS, National Institut es of Health ( NIH) Clinical Trans-
lational Scien ce Award (CTS A) program), b y grant # 2 448132 from t he Simons
Foundation Auti sm Research Initiativ e, and by EUROHEADP AIN (Nr. 602633)
Methods I (muGWAS)
MethodsII(DecisionStrategy)
Fig. 4: Psoriasis Results Show Disease Specificity.
For this autoimmune disease, muGWAS confirmed the
HLA region and IL4/5/12B/13/IL23R (and suggested one
new gene, s= -log
10
p=20 in the Michigan substudy of
1140 subjects) but neither TNF, Ras, nor ion channels.
PSORIASIS (Specificity)
Introduction EPILEPSY (Validation) AUTISM (A First Drug ?) MIGRAINES (Confirmation) Conclusion (Utilizing dbGaP)
SNP.A SNP.X SNP.Y SNP.Z
LD-Block
i
Recombination
Hotspot LD-Block
i+1
recessive: aa = aA < AA
dominant: aa < aA = AA
allelic: aa=0, aA=1, AA=2
ordinal: aa < aA < AA
Fig. 9: G enetic Risk Factors for Aura in Migrai ne.
Color-coding indicates l evel of significan ce in µGWAS
(blue) and ss GWAS (red). Lu minence indicates signifi-
cance (see legend; n=1915 FHM subjects were excluded).
With the higher signal/noise ratio of muGWAS
over ssGWAS, in two independent studies
(AGP 1/II,
Anney 2012
) of 1071/576 subjects with
autism, 18/8 genes reached study-specific GW
significance (Fig. 7). The top results where
highly enriched with Ras/Ca
2+
genes, including
the ion channels and transporters known to
cause familial hemiplegic migraines (Fig. 6). The
consistent involvement of K
+
and Cl
channels
suggested mefanamic acid (also effective against
migraines) as a novel drug to prevent DALD
(Disruption of Active Language Development).
2
A decade after the Human Genome Project, the
clinical advances hoped for from genome-wide
association studies (GWAS) have yet to be real-
ized. Enlarging sample sizes to 10,000–100,000s
limits the questions that can be addressed,
without guarding against non-functional SNPs
differing between non-randomized populations.
Combining a novel computational biostatistics
approach (u-statistics for genetically structured
data, muGWAS, Methods1) with decision stra-
tegies fine-tuned to GWAS (MethodsII), we can
now identify mechanistically related genes in
studies of 500–1000 subjects only.
The genetic data collected over the last decade
can now yield profound insights into the etio-
logy of commondiseases, and subgroup analyses
can rescue ‘failed’ phase 3 trials (Discussion).
muGWAS was validated by confirming the
known drug targets in two studies; of epilepsy
in 185 cases (Fig. 3) and of psoriasis in a re-
analysis of published data
(Nair 2009)
(Fig. 4).
The overlap between migraineurs with and w/o
aura and DALD (Fig. 5 and 9) is consistent with
the three neurodevelopmental phenotypes
(CAE, DALD in autism, and aura in migraine)
sharing risk factors
(Gargus 2009)
(Fig. 8), and, thus,
with the hypothesis that preventing ‘stranger
anxiety’ from rising to the level of ‘migraines’
might prevent a behavioral maladaptation
leading to DALD in autism.
2
With muGWAS’ better signal-to-noise ratio we
can now identify different, functionally similar
variations in different populations (Fig. 5).
Fig. 10: µGWAS results
implicating a g ene asso-
ciated with outcome A.
A: ssGWAS r esults for
outcome I are inconclusive.
B: µGWAS results for out-
come A implicate a gene at
around 100 Mb. Black dots
are ssGWAS, diamonds are
muGWAS results by diplo-
type width (size) and i nfor-
mation content (red: low).
C: Regional Manhattan Plot.
Linetype indicates DT width
(dotted: 2 .. solid 6 SNPs).
D: No association in t his
LD block with outcome II.
E: HapMap LD (red) a nd re-
comb. rate (blue). The peak
is located in the regultory
region of a known gene.
Type BType B
Type AType A
B: Outcome I
C: Outcome II
D: Outcome II
E
Fig.
11: High enrichme nt of muGWAS/muPhene
vs ssGWAS /Outcome I . Some non-respon ders have a
Mutation in the drug target (left), but most have muta -
tions along a diffe-
rent pathway
(right).
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.