19601900 1920 1940 19801800 2000 2020
U-statistics are conceptually
simple and ensure
biological validitybiological validity of results.
The matrix calculationsmatrix calculations
required, however, are
the linear model
(additivity) lead to
Wittkowski ’08 Hoeffding
1×1 k×k (~10x10)Matrices:
9 B 256kB0 B
)BMDPFisher (F, sw Reg)
Fig. 1: Memory gains driving statistics. Gauss and
Laplace developed m odern statistical t heory around 1800,
yet statistical tests emerged only with the advent of a
mechanical calculator wi th ‘memor y’ (Ohdner, ≈1900:
two registers). Each subsequent advance in memory
spurred a new episode in statist ics: stepwise regressi on /
principal component analysis (mainfraimes, 1965, kB),
Bayesian methods (PCs, 1985, MB), U-statistics for
multivariate data (32bit OS, 2001, GB ).
Fig. 5: Replication of the Ras/Ca
DALD. Darker colors indicate higher levels of signifi-
cance (cf Fig. 3). Pink circles: growth factor regulation,
green circles: ion cha nnels, red boxes: genes known to
cause familal hemiplegic migrain es.
Fig 8: Genetic Risk Factors for Hyperexcitability
Shared Among Migraines and DALD. Further
support for shared risk factors between migraines and
DALD comes from the th ree FHM genes al so being
associated with movement disorders (cerebellar ataxias,
SCA6/EA2), severe myoclonic epilepsy in infancy (SMEI),
and alternating hemiplegia of childhood (AHC), respec-
tively, all diseases where infant ons et is observed.
Fig. 7: Enrichment of most si gnificant genes with
genes involv ed in Ras /Ca
horizontal blue line: study-
specific cut-off (Fig. 3); top pan els: µGWAS, bottom
Fig. 2: Genetic Risk Mod el. Neighboring SNPs are
assumed to be in LD with intermediate disease loci,
unless separated by a recombination hotspot (×). The
effects of neighboring SNPs can be d ependent.
In contrast to single-SNP GWAS (ssGWAS) and
most GWAS based on linear weight (“allelic”,
lwGWAS) or linear/logistic regression (lrGWAS),
µGWAS avoids artifacts by accounting for LD
structure and varying levels of dominance (Fig. 2)
within the method, thereby also increasing power.
Fig. 5: Study-Specific GW Significance. Th e cut-off
(solid bar) is estimated as th e median projection from th e
chromosomes with the lowest deviation from th e
projection (dashed curves/bars ). (partial results shown)
Cut-offs for ‘genome-wide significance’ should de-
pend on study size, disease complexity, and data
Moreover, The expected distri-
bution in a GWAS QQ plot is not a straight line.
Calculating study-specific genome-wide signifi-
cance accounts for (a) the most significant results
to require a MAF ≈.50, even in ssGWAS, and
many dependent tests performed within and
between sliding windows (Fig. 5).
Finding the Missing Heritability in GWAS: U-statistics for Genetically Structured Data
in Subgroup Analyses of Epilepsy, Autism and Migraines
1) 408.06/J19 Mo n PM
AMJM van den Maagdenberg
The Rockefeller Univ. Hospital, Biostatistics, Epidemiology & Research Design,
Leiden Univ. Med. Ctr.,
Internat. Headache Consortium
1. Wi ttkowski KM, et al. (2013) From single-SNP to wi de-locus:
genome-wide association studies identifying f unctionally related
genes and intragenic regions in small sample studies.
Available from: http://www.ncbi.nlm.nih.gov/pubmed/23438886
2. Wi ttkowski KM, et al. (2014) A Novel Computational Biostatistics
Approach Implies I mpaired De phosphorylation of Growth Factor
Receptors As Associated With Severity of Autism.
Translational Psychiatry. 4:e354.
Available from: http://www.nature.com/articles/tp2013124.
Fig. 3: Childhood Absence Epilepsy (CAE) in 185
children vs. 370 Illumina controls. Darker colors indicate
higher levels of significance.
In a phase 3 trial, only 400 of 600 treated sub-
jects responded to the study drug with respect
to either of two outcomes. While ssGWAS
results were inconclusive (Fig. 10, only outcome
I shown), muGWAS implicated mutations along
a pathway not targeted by the drug (Fig. 11).
A: Outcome 1
Fig. 4: High enrichment of µGWAS vs ssGWAS.
Some non-resp onders hav e a m utation in the drug tar get
(left), but most have mutations in genes al ong a different
pathway (right) and, thus, need a different drug.
Supported in part by grant # UL1 TR000043 from the National Center for Advan cing
Translational Scien ces (N CATS, National Institut es of Health ( NIH) Clinical Trans-
lational Scien ce Award (CTS A) program), b y grant # 2 448132 from t he Simons
Foundation Auti sm Research Initiativ e, and by EUROHEADP AIN (Nr. 602633)
Methods I (muGWAS)
Methods II (Decision Strategy)
Fig. 4: Psoriasis Results Show Disease Specificity.
For this autoimmune disease, muGWAS confirmed the
HLA region and IL4/5/12B/13/IL23R (and suggested one
new gene, s = -log
p= 20 in the Michigan substudy of
1140 subjects) but neither TNF, Ras, nor ion channels.
Introduction EPILEPSY (Validation) AUTISM (A First Drug ?) MIGRAINES (Confirmation) Conclusion (Utilizing dbGaP)
SNP.A SNP.X SNP.Y SNP.Z
recessive: aa = aA < AA
dominant: aa < aA = AA
allelic: aa=0, aA=1, AA=2
ordinal: aa < aA < AA
Fig. 9: G enetic Risk Factors for Aura in Migrai ne.
Color-coding indicates l evel of significan ce in µGWAS
(blue) and ss GWAS (red). Lu minence indicates signifi-
cance (see legend; n=1915 FHM subjects were excluded).
With the higher signal/noise ratio of muGWAS
over ssGWAS, in two independent studies
) of 1071/576 subjects with
autism, 18/8 genes reached study-specific GW
significance (Fig. 7). The top results where
highly enriched with Ras/Ca
the ion channels and transporters known to
cause familial hemiplegic migraines (Fig. 6). The
consistent involvement of K
suggested mefanamic acid (also effective against
migraines) as a novel drug to prevent DALD
(Disruption of Active Language Development).
A decade after the Human Genome Project, the
clinical advances hoped for from genome-wide
association studies (GWAS) have yet to be real-
ized. Enlarging sample sizes to 10,000–100,000s
limits the questions that can be addressed,
without guarding against non-functional SNPs
differing between non-randomized populations.
Combining a novel computational biostatistics
approach (u-statistics for genetically structured
data, muGWAS, Methods 1) with decision stra-
tegies fine-tuned to GWAS (Methods II), we can
now identify mechanistically related genes in
studies of 500–1000 subjects only.
The genetic data collected over the last decade
can now yield profound insights into the etio-
logy of common diseases, and subgroup analyses
can rescue ‘failed’ phase 3 trials (Discussion).
muGWAS was validated by confirming the
known drug targets in two studies; of epilepsy
in 185 cases (Fig. 3) and of psoriasis in a re-
analysis of published data
The overlap between migraineurs with and w/o
aura and DALD (Fig. 5 and 9) is consistent with
the three neurodevelopmental phenotypes
(CAE, DALD in autism, and aura in migraine)
sharing risk factors
(Fig. 8), and, thus,
with the hypothesis that preventing ‘stranger
anxiety’ from rising to the level of ‘migraines’
might prevent a behavioral maladaptation
leading to DALD in autism.
With muGWAS’ better signal-to-noise ratio we
can now identify different, functionally similar
variations in different populations (Fig. 5).
Fig. 10: µGWAS results
implicating a g ene asso-
ciated with outcome A.
A: ssGWAS r esults for
outcome I are inconclusive.
B: µGWAS results for out-
come A implicate a gene at
around 100 Mb. Black dots
are ssGWAS, diamonds are
muGWAS results by diplo-
type width (size) and i nfor-
mation content (red: low).
C: Regional Manhattan Plot.
Linetype indicates DT width
(dotted: 2 .. solid 6 SNPs).
D: No association in t his
LD block with outcome II.
E: HapMap LD (red) a nd re-
comb. rate (blue). The peak
is located in the regultory
region of a known gene.
Type BType B
Type AType A
B: Outcome I
C: Outcome II
D: Outcome II
11: High enrichme nt of muGWAS/muPhene
vs ssGWAS /Outcome I . Some non-respon ders have a
Mutation in the drug target (left), but most have muta -
tions along a diffe-