Rare Complete Knockouts in Humans:
Population Distribution and Significant Role
in Autism Spectrum Disorders
Elaine T. Lim,1,4,5,6,7Soumya Raychaudhuri,4,6,9Stephan J. Sanders,10Christine Stevens,4Aniko Sabo,11
Daniel G. MacArthur,1,4,6Benjamin M. Neale,1,4,5,6Andrew Kirby,1,4,6Douglas M. Ruderfer,1,3,4,5,6,8,12,14,15
Menachem Fromer,1,3,4,5,6,8,12,14,15Monkol Lek,1,4,6Li Liu,18Jason Flannick,1,2,4,6Stephan Ripke,1,4,5Uma Nagaswamy,11
Donna Muzny,11Jeffrey G. Reid,11Alicia Hawes,11Irene Newsham,11Yuanqing Wu,11Lora Lewis,11Huyen Dinh,11
Shannon Gross,11Li-San Wang,19Chiao-Feng Lin,19Otto Valladares,19Stacey B. Gabriel,4Mark dePristo,4
David M. Altshuler,1,2,4,6Shaun M. Purcell,1,3,4,5,6,8,12,14,15NHLBI Exome Sequencing Project, Matthew W. State,10
Eric Boerwinkle,11,21Joseph D. Buxbaum,13,14,15,16,17Edwin H. Cook,22Richard A. Gibbs,11Gerard D. Schellenberg,20
James S. Sutcliffe,23Bernie Devlin,24Kathryn Roeder,18and Mark J. Daly1,4,5,6,*
1Analytic and Translational Genetics Unit
2Department of Molecular Biology
3Psychiatric & Neurodevelopmental Genetics Unit
Massachusetts General Hospital, Boston, MA 02114, USA
4Program in Medical and Population Genetics
5Stanley Center for Psychiatric Research
Broad Institute, Cambridge, MA 02142, USA
6Departments of Genetics and Medicine
7Program in Genetics and Genomics, Biological and Biomedical Sciences
8Department of Psychiatry
Harvard Medical School, Boston, MA 02115, USA
9Division of Immunology, Allergy, and Rheumatology, Brigham and Women’s Hospital, Boston, MA 02115, USA
10Departments of Psychiatry and Genetics, Yale University School of Medicine, New Haven, CT 06520, USA
11Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
12Division of Psychiatric Genomics
13Seaver Autism Center for Research and Treatment
14Department of Psychiatry
15Department of Genetics and Genomic Sciences
16Department of Neuroscience
17Friedman Brain Institute
Mount Sinai School of Medicine, New York, NY 10029, USA
18Department of Statistics and Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, PA 15213, USA
19Penn Center for Bioinformatics
20Pathology and Laboratory Medicine, Perelman School of Medicine
University of Pennsylvania, Philadelphia, PA 19104, USA
21Human Genetics Center, University of Texas Health Science Center at Houston, TX 77030, USA
22Department of Psychiatry, University of Illinois at Chicago, Chicago, IL 60612, USA
24Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15260, USA
To characterize the role of rare complete human
knockouts in autism spectrum disorders (ASDs),
we identify genes with homozygous or compound
as nonsense and essential splice sites) from exome
sequencing of 933 cases and 869 controls. We iden-
tify a 2-fold increase in complete knockouts of auto-
somal genes with low rates of LoF variation (%5%
frequency) in cases and estimate a 3% contribution
to ASD risk by these events, confirming this observa-
tion in an independent set of 563 probands and 4,605
controls. Outside the pseudoautosomal regions on
the Xchromosome, we similarlyobserve asignificant
1.5-fold increase in rare hemizygous knockouts in
males, contributing to another 2% of ASDs in males.
Taken together, these results provide compelling
evidence that rare autosomal and X chromosome
complete gene knockouts are important inherited
risk factors for ASD.
Autism spectrum disorder (ASD) is a highly heritable, common
disorder that affects ?1 in 88 individuals (Autism and
Neuron 77, 235–242, January 23, 2013 ª2013 Elsevier Inc. 235
Developmental Disabilities Monitoring Network Surveillance
Year 2008 Principal Investigators, C.D.C., 2012). Previous
studies have shown a reproducible contribution of de novo
copy number variants (CNVs) (Levy et al., 2011; Pinto et al.,
2010; Sanders et al., 2011; Sebat et al., 2007; Weiss et al.,
2008) and de novo single nucleotide variants (SNVs) (Iossifov
et al., 2012; Neale et al., 2012; O’Roak et al., 2012; Sanders
et al., 2012) to ASD risk—though these effects provide little
explanation for the widely recognized high heritability (Constan-
tino et al., 2012).
An early segregation analysis on 46 multiplex families (each
with multiple affected children) suggested evidence for an auto-
a subsequent study showing that ASD is unlikely to fit a model
with a major gene effect (Jorde et al., 1991). Further to this point,
the most recent results from de novo CNVs and SNVs point to
a model in which hundreds of genes are likely to contribute to
autism risk. Building from these observations, as a means of
providing insight into the heritable component of ASD risk, we
sought to test the hypothesis that 2-hit etiologies exist in ASD
and that these events, like the de novo CNVs and SNVs, are
most likely to be distributed over many genes. Supporting this
hypothesis are historical segregation analyses (Ritvo et al.,
1985; Zweier et al., 2009), the successful use of homozygosity
mapping in consanguineous populations (Morrow et al., 2008),
as well as recent studies showing that ASD probands had
a significant excess of homozygous haplotype sharing, sug-
gesting that there are recessive loci in these risk-conferring
haplotypes (Casey et al., 2012; Chahrour et al., 2012). Other
studies have also implicated the role of a 2-hit or oligogenic
model for rare CNVs in ASD (Girirajan et al., 2012).
It has been shown that there are relatively few homozygous
or compound heterozygous LoF variants (i.e., complete gene
outs found are common (minor allele frequency [MAF] > 5%) and
are distributed across a very small number (?100–200) of genes,
such as the olfactory receptors, that are apparently inessential
and do not result in any obvious phenotype or severe medical
consequence (MacArthur et al., 2012). We similarly observe in
these ASD data sets that an average individual harbors approx-
imately five common complete knockouts (from nonsense and
essential splice site variants) distributed across a small subset
of genes on the autosomes. In striking contrast, if we consider
only LoF variants with frequency %5%, fewer than 5% of indi-
viduals harbor even a single rare complete knockout (Table 1).
While heterozygous LoF mutations are seen in thousands of
genes, the very lowfrequency and paucity of observed complete
knockouts suggests a broad pool of genes (including many
Mendelian disorders) in which 2-hit variants may give rise to
severe and reproductively deleterious phenotypes. While genes
with common complete knockouts are more likely to be benign
(or unlikely to result in severe phenotypes with high penetrance),
genes with rare complete knockouts are more likely to be
disease causing (Gorlov et al., 2008) simply because selection
prevents deleterious recessive-acting variants from reaching
even moderate allele frequencies.
If a subset of ASD cases were caused by rare 2-hit events with
large effects (e.g., odds ratios of >5) distributed across many
different genes, then family-based linkage or GWAS would
would explain a very small fraction of all cases given the
commonness of the outcome and the large number of ASD
genes. To evaluate evidence for such 2-hit etiologies in ASD,
we studied the distribution and patterns of rare complete knock-
outs from whole-exome sequence data across two case-control
studies comprised of 1,802 European subjects to identify events
in which individuals carried two LoF autosomal variants in
a single gene in trans. In this study, we show that rare complete
knockouts on the autosomes (variant allele frequencies of %5%)
are significantly enriched in cases, suggesting that these events
contribute to the genetic etiology of ASD.
A variant with a diploid allele frequency of 5% on the
autosomes results in a complete knockout in 0.25% of individ-
uals. Outside the pseudoautosomal regions on the X chromo-
some in males, a single LoF variant with 0.25% allele frequency
also results in a complete knockout in 0.25% of males. Similarly,
we found that rare complete knockouts on the X chromosome
(variant allele frequencies of %0.25%) are also significantly en-
riched in male cases, further reinforcing the role of rare complete
knockouts as risk factors for ASD.
Exome Capture and Sequencing
To assess the contribution of rare complete knockouts to ASD,
we analyzed data from an ethnically matched case-control pop-
ulation. We selected 933 cases and 869 controls sequenced
in this study by matching them with multidimensional scaling
(MDS) of common variants genotyped on Illumina 1M, Affymetrix
tial confounding by population stratification. The exomes were
sequenced at two different sequencing centers—the Broad
Institute (BI) and the Baylor College of Medicine (BCM). A total
of 428 ASD cases selected from the Autism Genetic Resource
Exchange (AGRE) and the Autism Consortium of Boston (AC)
and 378 controls (a total of 806 individuals) were sequenced at
BI, and another 505 ASD cases selected from the Autism
Simplex Collection (TASC) and 491 controls (a total of 996
Table 1. Population Distribution of Rare and Common LoFs
Average Number of
Number of Unique Genes
with a Homozygous Variant
Average Number of
Number of Unique Genes
with a Heterozygous Variant
Rare (%5%) LoFs 0.05 variants per individual33 genes13 variants per individual 3,409 genes
Common (>5%) LoFs 5 variants per individual96 genes 36 variants per individual 99 genes
The average number of rare (%5%) and common (>5%) homozygous LoF variants, as well as the average number of such variants calculated from the
BI case-control data set.
Rare Complete Knockouts in Autism
236 Neuron 77, 235–242, January 23, 2013 ª2013 Elsevier Inc.
individuals) were sequenced at BCM, resulting in 1,802 individ-
uals across the two case-control data sets. All controls were
selected from an National Institute of Mental Health (NIMH)
control repository and were ascertained for not having schizo-
phrenia or bipolar mood disorder. Another 563 probands were
added into the final analyses (388 trios/quartets from the Simons
Simplex Collection [SSC]; Iossifov et al., 2012; Sanders et al.,
2012), 175 trios from the Boston Autism Consortium sequenced
at BI (104 from Neale et al., 2012), and together with 4,605 addi-
tional European controls from the NHLBI exome sequencing
project and the 1000 Genomes Project, this resulted in a total
of >6,000 exomes used in this study (see Table S1 available
online). The metrics for the case-control data sets are described
in Table S2.
Enrichment of Rare Complete Knockouts in ASD
Given that rare complete knockouts consist of both compound
heterozygous and homozygous variants on the autosomes,
we adapted a statistical phasing approach similar to the four-
haplotype test to eliminate instances in which multiple LoF
variants may segregate in cis (Figure S1). There are a total of
91 such rare complete knockouts in the case-control data
sets, with 62 of these found in the cases compared to 29 in the
controls (Table S3), representing a roughly 2-fold enrichment of
these events in the cases (odds ratio [OR] = 2.0, 95%confidence
interval [CI] = [1.5, 2.5], one-sided permutation, p = 0.0017).
Based on the difference between cases and controls (6% of
the cases versus 3.3% of the controls have a rare complete
knockout), we estimate an ?3% contribution by rare complete
knockouts to ASD. While different capture and sequencing
technologies were employed at the two sequencing centers,
and different depths of sequencing achieved (Liu et al., personal
communication), the excess in cases was consistent in the two
data sets (ORs = 2.1, 95% CI = [1.5, 2.7] and 1.8, 95% CI =
Using the results from a previous study of expression patterns
of postmortem brains (Kang et al., 2011), we observed that the
enrichment in rare complete knockouts in cases was particularly
pronounced in genes found to be expressed in the brain, with 37
events in cases compared to only 13 in the controls (OR = 2.7,
95% CI = [2.1, 3.3], one-sided permutation, p = 0.002), although
this enrichment in brain-expressed genes was not significantly
different from the global enrichment observed (one-sided
permutation p = 0.13, Figure 1).
To confirm that this excess was not an artifact of any residual
uncertainty in statistical phasing, weexamined the subsetof rare
complete knockouts that were homozygous LoF variants alone
and found that these events were also significantly enriched by
2-fold (42 in cases and 19 in controls, OR = 2.1, 95% CI = [1.6,
2.6], one-sided permutation, p = 0.0059, Table S2). We further
ensured that the excess was not driven by inaccuracies in
phasing ‘‘singleton’’ variants (variants that were observed only
once in a single individual) and found that rare complete knock-
outs excluding the singleton variants were also significantly
enriched (48 in cases and 24 in controls, OR = 1.9, 95% CI =
[1.4, 2.4], one-sided permutation, p = 0.0081). Since an excess
in 2-hit LoFs could arise trivially if there was a significant overall
difference in rates of LoF variants between cases and controls,
weevaluated the total number of single-copy losses (i.e., hetero-
zygous LoF carriers) with variant allele frequencies %5% found
in cases compared to controls and saw no enrichment (OR =
1.0, 95% CI = [0.9, 1.1], Table S4). Finally, we validated all
variants by ensuring that they were either present in dbSNP,
the NHLBI Exome Sequencing Project, and/or were confirmed
using Fluidigm genotyping, Sanger sequencing, or Fluidigm
PCR with MiSeq sequencing with 94% of these variants vali-
dating as true polymorphisms (Tables S5 and S7). Even con-
servatively assuming all validation failures were false-positive
SNPs (rather than genotyping assay failures), removing the three
events in cases and two in controls from the overall tallies has
no impact on the results. As a final check, we used rare homozy-
gous and compound heterozygous (or ‘‘2-hit’’) synonymous
events, as well as common complete knockouts, as internal
controls and confirmed the enrichment of rare complete knock-
outs was far greater and significantly different compared to both
of these (Table S4).
Knockouts via homozygosity of rare LoF sites could arise
from hemizygous LoF variants that were exposed through the
algorithm for exome sequencing (XHMM) (Fromer et al., 2012),
we found that two of the homozygous LoFs observed in cases
(E201X in KRT83 and E211X in PRAMEF2) were, in fact, LoF
variants unmasked by deletions spanning across the regions
(11 kb and 183 kb deletions, respectively), although this does
not change the fact that they are complete gene knockouts.
To confirm these observations, we examined an independent
set of cases (n = 563) from recent trio sequencing efforts (in
which 2-hit knockout status was certain from the existence of
Figure 1. Expression Patterns of the Complete Knockouts
(A) The enrichment of rare complete knockouts in cases versus controls. (B)
The enrichment observed in rare complete knockouts is not observed in the
common complete knockouts. The x axis indicates the average number of
events per individual in cases and controls and the numbers above the
barplots indicate the total number of such events in cases and controls, with
the odds ratios (OR) shown above.
Rare Complete Knockouts in Autism
Neuron 77, 235–242, January 23, 2013 ª2013 Elsevier Inc. 237
parental sequence data) and compared to a broader population
data set (n = 4,605) from the NHLBI exome sequencing project
and 1000 Genomes Project (Table S1). The enrichment (7.6%
in cases to 5.5% in controls, hypergeometric test, p = 0.016)
was replicated in this comparison as well—further confirming
the veracity of this observation.
Similar Enrichment of Rare Complete Knockouts
Observed on the X Chromosome
Given the gender bias in ASD, with roughly four times as many
affected males as females (Devlin and Scherer, 2012), we asked
analogously whether rare gene knockouts outside the pseu-
doautosomal regions on the X chromosome (arising from hemi-
zygous LoFs in males) were enriched in male cases versus
male controls. To further increase the sample sizes, we included
the male probands and their unaffected fathers from the trios
and quartets. The nucleotide diversity on the X chromosome is
estimated to be between half to three-quarters that of the auto-
somes and deleterious LoF variants on the X chromosome are
under stronger negative selection given the smaller effective
population size and constant exposure in hemizygous males
(Gottipati et al., 2011). To match the baseline knockout rate to
the autosomes, in which we examined variants with %5% MAF
and therefore %0.25% homozygosity, we examined LoF vari-
ants with population frequency (assessed in female control
samples) of %0.25%. On average, we observed less than one
such rare LoF variant on the X chromosome in both males and
females (Table S6).
Similar to the autosomes, we observed a significant enrich-
ment of rare hemizygous LoFs in male cases (Table 2), with 88
such events observed—60 of them were found in male cases
and 28 of them were found in male controls (OR = 1.5, 95%
CI = [1.1, 2.0], one-sided hypergeometric test, p = 0.034, Table
S7). No enrichment was seen in the internal controls of this
comparison—rare hemizygous synonymous variants were not
enriched in male cases compared to male controls (OR = 1.0,
95% CI = [0.9, 1.1]), indicating that the observed enrichment is
specific to rare complete knockouts on the X chromosome
in male ASD cases. Based on the difference between cases
and controls, we further estimate another 1.7% contribution
by rare complete knockouts on the X chromosome in male
cases. In addition, we found 2 of 170 female cases bearing
a rare complete knockout on the X chromosome and 0 of
452 female controls. As with the autosomes, we attempted vali-
dation for 44 of 50 rare X chromosome LoF variants and all 44
We screened the list of rare complete knockouts observed on
the autosomes and X chromosome for instances in which
a knockout was observed only in cases and not in any of the
controls (Table S8) and performed a screen for enrichment of
pathways and microRNA targets using WebGestalt (Zhang
et al., 2005). The top pathway (‘‘complement and coagulation
cascades’’) was driven by two genes (KNG1 and PLAT; cor-
rected p = 0.0027). Scanning predicted targets of microRNAs,
we found one (mir-328) predicted to target three genes from
the list (HAP1, AFF2, and MECP2; corrected p = 0.0013; Table
S9). Additional siblings (affected = 30, unaffected = 17) were
available for 31 probands that were genotyped to examine
segregation of a proposed recessive model (Table S10). We
observed 25 (expected 20) instances in which segregation was
consistent with a fully penetrant recessive model, including
four genes with rare complete knockouts (PTH2R, MECP2,
control in any wave of our study.
Gender and IQ
It has been shown that the male gender bias is stronger in high-
functioning ASD cases, and the gender bias is reduced for
syndromic cases (Newschaffer et al., 2007). We found that
there was a higher rate of rare complete knockouts in females
(5.4%) compared to males (4%). Although 16% of the cases
sequenced were female, 25% of the cases harboring rare
complete knockouts were female (OR = 1.7, 95% CI = [1.3,
2.1], one-sided Fisher’s test, p = 0.076). While not statistically
significant, this trend is similar to previous observations that
de novo CNVs and SNVs show a higher fraction of female cases
with such events (Iossifov et al., 2012; Levy et al., 2011; Sanders
et al., 2011) and consistent with the model that females need
a higher dose of genetic risk to manifest a diagnosis of ASD.
We also observed a trend in IQ scores from 18 of these cases
with rare complete knockouts to another 133 cases (mean Z
score = ?0.26 in probands with rare complete knockouts versus
Table 2. Number of Rare LoF and Synonymous Variants on the X Chromosome
Hemizygous LoFs in males (n = 2,144)
Cases (n = 1,245)60 events 2,114 events
Controls (n = 899) 28 events1,516 events
OR [95% CI] 1.5 [1.1, 2.0]1.0 [0.9, 1.1]
Heterozygous LoFs in females (n = 622)
Cases (n = 170)21 events 641 events2 events 5 events
Controls (n = 452) 56 events1,256 events 0 events0 events
OR [95% CI] 1.0 [0.5, 1.5] 1.4 [1.2, 1.6]––
The number of rare hemizygous LoF and synonymous variants outside the pseudoautosomal regions on the X chromosome in males, as well as the
number of rare heterozygous LoF and synonymous variants in females are shown, together with the respective odds ratios.
Rare Complete Knockouts in Autism
238 Neuron 77, 235–242, January 23, 2013 ª2013 Elsevier Inc.
0.035 in other cases), but it was not statistically significant (one-
sided Wilcoxon test, p = 0.11).
As shown previously, de novo CNVs are extremely rare events in
a control population and they occur at 1%–2%in controls. Given
the rarity of such events, discovery of a global enrichment of
these de novo CNVs at a much higher rate of 6%–8% in ASD
individuals suggested a 6% contribution to ASD by these de
novo CNVs (Levy et al., 2011; Sanders et al., 2011; Sebat
et al., 2007). This highlighted the significance of such events as
risk factors for ASD and subsequent association and replication
studies of such events with larger sample sizes pinpointed
to specific de novo CNVs that have since been significantly
associated with ASD, such as deletions and duplications on
chromosome 16p11.2 (Weiss et al., 2008).
Similar to the de novo CNV studies, as well as emerging de
novo SNV studies, we observed that rare complete knockouts
in the human exome are found in only 3% of a control population
but are present at a 2-fold enrichment in ASD cases. Given that
these rare complete knockouts are not found in a single gene
but, like the de novo CNVs and SNVs, are distributed across
many different genes, these events would have been missed
through previous association or linkage studies. As with any
genetic screen, population stratification can confound these
results. However, the samples selected for sequencing were of
European ancestry and individually matched in case-control
pairs based on principal component analyses and selected
from a much larger pool of potential samples. Owing to occa-
sional sample failure, ultimately 88% of the final samples were
was observed in the subset of matched cases and controls
for the rare complete knockouts (49 events in cases versus
25 events in controls, OR = 2, 95% CI = [1.5, 2.5]).
Interestingly, we observed a 1.5-fold enrichment of hemizy-
gous LoF variants on the X chromosome in male cases
compared to male controls but did not observe a significant
global enrichment of heterozygous LoF variants on the
X chromosome in female cases compared to female controls.
There are genes on the X chromosome that can cause ASD-
related disorders like Rett Syndrome in an X-linked dominant
mode of inheritance such as CDKL5 and MECP2. However, we
found that while there is a significant 1.5-fold enrichment in
hemizygous LoFs in male cases, we did not observe a significant
enrichment in single-copy losses in female cases, consistent
with the observation that we did not see an overall difference
in single-copy (heterozygous) losses on the autosomes. Given
that males have only a single copy of the X chromosome and
would be more susceptible to a complete knockout on the X
chromosome than females, these rare complete knockouts on
the X chromosome can also explain a small part of the male
gender bias observed in ASD.
Among our list of consolidated genes with rare complete knock-
outs that were observed only in cases (Table S8), we discovered
a known autosomal recessive gene in one of the probands from
the trios—Usher syndrome 2A (USH2A), which has been
reported to cause a known autosomal recessive disease Usher
Syndrome Type II, characterized by mild to severe hearing loss
and sometimes retinitis pigmentosa (Yan and Liu, 2010). We
found and confirmed the bilinieal inheritance of two previously
(W2075X and Y4238X) in USH2A from both parents. Clinical
follow-up confirmed an Usher Syndrome Type II diagnosis—
a potential confounder in the diagnosis of ASD (Johansson
et al., 2010).
When we cross-compared the list of genes harboring rare
complete knockouts with previously published literature on de
novo SNVs (Iossifov et al., 2012; Neale et al., 2012; O’Roak
et al., 2012; Sanders et al., 2012), we found three genes that
were common between the rare complete knockouts and de
novo SNVs—IFIH1 (in which a de novo missense variant was
found in a proband), ABCC12 (in which a de novo silent variant
was found in a proband), and PKHD1L1 (in which a de novo
upstream variant was found in a proband).
We further compared the list of X chromosome genes with
previously published CNVs and found that there are two
genes that have been previously associated with rare CNVs.
We found an affected male with a rare hemizygous splice
variant (c.359-2T > C) in the trimethyllysine hydroxylase, epsilon
protein—TMLHE, which is involved in the biosynthesis of carni-
tine (Celestino-Soper et al., 2011). Recently, TMLHE deficiency
resulting in dysregulation of carnitine metabolism has also
been proposed as a risk factor for ASD (Celestino-Soper et al.,
2012; Nava et al., 2012). Another affected male was found to
harbor a hemizygous splice variant (c.3034-1G > A) in the
protocadherin 11 X-linked protein—PCDH11X. An inherited
deletion in PCDH11X, as well as a de novo deletion in PCDH11Y,
was previously reported in a child with severe language delay,
suggesting a potential role for PCDH11X in language develop-
ment (Speevak and Farrell, 2011).
There were three genes with atleast two male cases harboring
rare complete knockouts on the X chromosome and no controls
were found to harbor rare complete knockouts in these genes
(SLC22A14, LUZP4, and DGAT2L6). In addition, among a list
of genes known to be involved in intellectual disability (Neale
et al., 2012), we found four genes from our list with rare complete
variant Q283X in the Fragile X E mental retardation syndrome
protein (AFF2), which causes nonsyndromic mental retardation
andthisnonsense variantresults inmorethan80%ofthe protein
to be truncated. Another male case has a nonsense variant
resulting in Q1471X in an uncharacterized gene KIAA2022 and
mouse studies revealed that the protein is expressed in the
developing brain and plays a role in neurite outgrowth (Ishikawa
et al., 2012). A third male case has a splice variant c.961+1G > A
in Sushi-repeat containing protein, X-linked 2 (SRPX2), a protein
that is found to be expressed in neurons. Mutations in SRPX2
have been reported to be associated with rolandic epilepsy
with speech and cognition impairment (Roll et al., 2006) and
FOXP2, a gene that is involved in speech and language disor-
ders, has been shown to regulate SRPX2 (Roll et al., 2010). A
fourth male with ASD harbored an E495X nonsense variant in
methyl CpG binding protein 2 (MECP2). Complete knockouts in
Rare Complete Knockouts in Autism
Neuron 77, 235–242, January 23, 2013 ª2013 Elsevier Inc. 239
MECP2 are lethal in males and heterozygous LoFs in MECP2
cause Rett Syndrome in females. Interestingly, the hemizygous
nonsense mutation that was observed in this male case trun-
cates only the last four amino acids of the MECP2 protein and
this potentially generates a protein product, which explains
why the hemizygous LoF observed in this gene is viable in
a male. Late-truncating mutations in MECP2 have been reported
to causethe Zappellavariant ofRett Syndrome,which isa milder
form of Rett Syndrome and autistic behavior is often observed
in affected individuals (Renieri et al., 2009).
Total Contribution to ASD from De Novo and Inherited
As described previously in various studies, there is an estimated
6% contribution to ASD risk from de novo CNVs (Levy et al.,
2011; Sanders et al., 2011; Walsh et al., 2008). Recent studies
have estimated another 10% contribution to ASD risk by de
novo SNVs (Iossifov et al., 2012; Neale et al., 2012; O’Roak
et al., 2012; Sanders et al., 2012). In this study, we estimate
a 3% contribution to ASD risk by rare complete knockouts on
the autosomes and another 2% contribution by rare complete
knockouts on the X chromosome, resulting in another 5%
contribution to ASD risk. Because a comparably reliable and
validated set of insertion and deletion variants are not yet avail-
able across our entire data set, we have not fully evaluated the
contribution of frameshifts. Given that there is likely a similar
number of frameshift mutationsas single nucleotide LoFvariants
(Iossifov et al., 2012; MacArthur et al., 2012), the addition of
frameshifts will likely increase this contribution further.
The global enrichment of rare complete knockouts in cases
highlights the significance of such events in the overall genetic
etiology of ASD. In addition, these events provide further insight
into the heritable component of ASD, which has not yet been ac-
counted for by de novo CNVs and SNVs. However, these rare
complete knockouts are distributed across many different
genes. This agrees with our current understanding of ASD
genetics to date: that this complex disorder follows a multigenic
model in which hundreds of genes are involved and that each
individual gene accounts for a small fraction of ASD. Together
with the ongoing de novo CNV and SNV studies, our study and
that of another study in this issue of Neuron (Yu et al., 2013),
demonstrate convincing evidence of a rare recessive contribu-
tion to the heritability of ASD.
The institutional review board of all participating institutions approved this
study and written informed consent from all subjects was obtained.
The data sets and detailed information for the samples have been deposited
into dbGAP (accession ID: phs000298.v1.p1).
Data Quality Control and Filtering
BI data was processed with Picard (http://picard.sourceforge.net), which
utilizes base quality score recalibration and local realignment at known indels
and BWA formappingreadsto hg19.SNPswerecalled usingGATK (McKenna
et al., 2010). BCM data was processed with Picard and reads mapped to hg18
using Bfast (Homer et al., 2009). The quality score recalibration and indel
realignment was performed using GATK, followed by SNV identification using
AtlasSNP 2 software (Challis et al., 2012). Genotyping data from Affymetrix
5.0 and 6.0 was filtered using an MAF threshold of R5% and missing geno-
types with %2% using PLINK and concordance checks were performed on
the variant calls from the sequencing and genotyping arrays. Three samples
with low concordance between the exome sequencing and genotyping
arrays (%90%) were detected in the BI case-control data set and discarded
from further analyses.
The variants used in this study were restricted to sites that passed the
standard GATK filters to eliminate SNPs with strand bias, low quality for the
depth of sequencing achieved, homopolymer runs, and SNPs near indels.
And variants were required that had an average read depth of R103 and
a quality score of R30. Homozygous calls were required to have less than
10% of the alternate allele and heterozygous calls to have an allele balance
of between 30% and 70%. A HWE threshold of R0.05 was used as well.
A set of 160 rare variants was selected for Sequenom validation and the vali-
dation rate using these filters was 99.5%.
Annotation and Analyses
For the case-control data sets, we annotated each variant according to the
longest transcript from the RefSeq database. The trio and quartet data sets
were annotated using a custom pipeline that was built on top of the Variant
Effect Predictor (McLaren et al., 2010) to allow more stringent filtering of
annotation artifacts from the 1000 Genomes Project (MacArthur et al., 2012).
The cases and controls in the BI data set was compared separately from
the cases and controls in the BCM data set before combining the results, to
ensure that differences in sequencing technologies and platforms did not
affect the results. Variants on the autosomes were filtered using MAF %5%
in the controls from each data set.
Variants on the X chromosome were filtered using similar thresholds as the
autosomal variants. In addition, variants that were found to be heterozygous in
males were removed from the analyses as such inconsistencies were most
likely to have resulted from misalignment errors. To increase the number of
observations for the X chromosome analyses, we added male probands
from the trios/quartets as additional cases to the overall counts from the
case-control data sets and their fathers were added as additional controls,
since male offspring do not inherit their X chromosomes from their fathers
and the X chromosomes in their fathers would serve as perfect normal
controls. In addition, the MAF for rare variants on the X chromosome were
calculated from a large set of control females from the NHLBI exome
Linkage Disequilibrium-Based Phasing of Variant Pairs
We adopted a linkage disequilibrium (LD)-based method, similar to the four-
haplotype testused to detect arecombination event,to phase pairs of variants
within the same gene and applied this approach to predict compound hetero-
zygous variants in the case-control data sets. A pair of variants (A and B) was
predicted to occur on different chromosomes if:
(1) We observed at least one individual who is heterozygous for variant A;
(2) we observed at least one individual who is heterozygous for variant B;
(3) we did not observe any individual who is homozygous at one variant
and has at least one copy of the second variant (Figure S1).
In addition, since we cannot accurately phase singletons, we included all
pairs of variants if at least one of them is a singleton.
Statistical Analyses for Global Enrichment
For eachvariant,we calculatedthe MAF of the variant inthe controls.The MAF
of a variant pair is the maximum MAF of either variant in the pair. Multiple
variant pairs within the same gene in the same individual were counted as
a single complete knockout event. We calculated the normalized enrichment
ratio as the (total number of events in cases/total number of events in controls)
of cases and controls that were sequenced. We assessed the statistical sig-
nificance of the global enrichment by shuffling the case-control labels for
10,000 permutations. For the enrichment analyses on the X chromosome,
one-sided hypergeometric probabilities were calculated assuming that
Rare Complete Knockouts in Autism
240 Neuron 77, 235–242, January 23, 2013 ª2013 Elsevier Inc.
hemizygous synonymous variants in male cases and controls are largely
neutral variants. All the analyses were performed within each case-control
data set separately before combining the results, to ensure that the observa-
tions were not driven by a single data set.
Supplemental Information includes one figure, ten tables, and Supplemental
Experimental Procedures and can be found with this article online at http://
We gratefully acknowledge the following resources and families who con-
tributed to them: the National Institute of Mental Health (NIMH) repository
(U24MH068457); Autism Genetic Resource Exchange (AGRE) Consortium,
a program of Autism Speaks (1U24MH081810 to Clara M. Lajonchere); The
Autism Simplex Collection (TASC) (grant from Autism Speaks); Simons Foun-
dation Autism Research Initiative (SFARI) Simplex Collection (grant from the
Simons Foundation); The Autism Consortium (grant from the Autism Consor-
tium). For full citation of resources, please see Supplemental Information.
This work was directly supported by NIH grants R01MH089208 (M.J.D.),
R01MH089025 (J.D.B.), R01MH089004 (G.D.S.), R01MH089175 (R.A.G.),
and R01MH089482(J.S.S.) andsupported
P50HD055751 (E.H.C.), R01MH057881 (B.D.), and R01MH061009 (J.S.S.).
We acknowledge partial support from U54 HG003273 (R.A.G.) and U54
HG003067 (E. Lander). We thank Thomas Lehner (NIMH), Adam Felsenfeld
(NHGRI), and Patrick Bender (NIMH) for their support and contribution to the
project. E.B., J.D.B., B.D., M.J.D., R.A.G., K.R., A.S., G.D.S., and J.S.S. are
lead investigators in the ARRA Autism Sequencing Collaboration (AASC). We
would also like to thank the NHLBI GO Exome Sequencing Project and its
ongoing studies that produced and provided exome variant calls for compar-
ison: the Lung GO Sequencing Project (HL-102923), the WHI Sequencing
Project (HL-102924), the Broad GO Sequencing Project (HL-102925), the Se-
attle GO Sequencing Project (HL-102926), and the Heart GO Sequencing
in partby NIHgrants
Accepted: December 22, 2012
Published: January 23, 2013
Autism and Developmental Disabilities Monitoring Network Surveillance Year
2008 Principal Investigators, C.D.C. (2012). Prevalence of autism spectrum
disorders–Autism and Developmental Disabilities Monitoring Network, 14
sites, United States, 2008. MMWR Surveill Summ. 61, 1–19.
Casey, J.P., Magalhaes, T., Conroy, J.M., Regan, R., Shah, N., Anney, R.,
Shields, D.C., Abrahams, B.S., Almeida, J., Bacchelli, E., et al. (2012). A novel
approach of homozygous haplotype sharing identifies candidate genes in
autism spectrum disorder. Hum. Genet. 131, 565–579.
Celestino-Soper, P.B., Shaw, C.A., Sanders, S.J., Li, J., Murtha, M.T., Ercan-
Sencicek, A.G., Davis, L., Thomson, S., Gambin, T., Chinault, A.C., et al.
(2011). Use of array CGH to detect exonic copy number variants throughout
the genome in autism families detects a novel deletion in TMLHE. Hum. Mol.
Genet. 20, 4360–4370.
Celestino-Soper, P.B., Violante, S., Crawford, E.L., Luo, R., Lionel, A.C.,
Delaby, E., Cai, G., Sadikovic, B., Lee, K., Lo, C., et al. (2012). A common
X-linked inborn error of carnitine biosynthesis may be a risk factor for nondys-
morphic autism. Proc. Natl. Acad. Sci. USA 109, 7974–7981.
Chahrour, M.H., Yu, T.W., Lim, E.T., Ataman, B., Coulter, M.E., Hill, R.S.,
Stevens, C.R., Schubert, C.R., Greenberg, M.E., Gabriel, S.B., and Walsh,
C.A.; ARRA Autism Sequencing Collaboration
sequencing and homozygosity analysis implicate depolarization-regulated
neuronal genes in autism. PLoS Genet. 8, e1002635.
Challis, D., Yu, J., Evani, U.S., Jackson, A.R., Paithankar, S., Coarfa, C.,
Milosavljevic, A., Gibbs, R.A., and Yu, F. (2012). An integrative variant analysis
suite for whole exome next-generation sequencing data. BMC Bioinformatics
Constantino, J.N., Todorov, A., Hilton, C., Law, P., Zhang, Y., Molloy, E.,
Fitzgerald, R., and Geschwind, D. (2012). Autism recurrence in half siblings:
strong support for genetic mechanisms of transmission in ASD. Mol.
Psychiatry. Published online February 28, 2012. http://dx.doi.org/10.1038/
Devlin, B., and Scherer, S.W. (2012). Genetic architecture in autism spectrum
disorder. Curr. Opin. Genet. Dev. 22, 229–237.
Fromer, M., Moran, J.L., Chambert, K., Banks, E., Bergen, S.E., Ruderfer,
D.M., Handsaker, R.E., McCarroll, S.A., O’Donovan, M.C., Owen, M.J., et al.
(2012). Discovery and statistical genotyping of copy-number variation from
whole-exome sequencing depth. Am. J. Hum. Genet. 91, 597–607.
Girirajan, S.,Rosenfeld, J.A.,Coe,B.P.,Parikh, S.,Friedman,N.,Goldstein,A.,
Filipink, R.A., McConnell, J.S., Angle, B., Meschino, W.S., et al. (2012).
Phenotypic heterogeneity of genomic disorders and rare copy-number vari-
ants. N. Engl. J. Med. 367, 1321–1331.
Gorlov, I.P., Gorlova, O.Y., Sunyaev, S.R., Spitz, M.R., and Amos, C.I. (2008).
Shifting paradigm of association studies: value of rare single-nucleotide poly-
morphisms. Am. J. Hum. Genet. 82, 100–112.
Gottipati, S., Arbiza, L., Siepel, A., Clark, A.G., and Keinan, A. (2011). Analyses
of X-linked and autosomal geneticvariation inpopulation-scale whole genome
sequencing. Nat. Genet. 43, 741–743.
large scale genome resequencing. PLoS ONE 4, e7767.
Iossifov, I., Ronemus, M., Levy, D., Wang, Z., Hakker, I., Rosenbaum, J.,
Yamrom, B., Lee, Y.H., Narzisi, G., Leotta, A., et al. (2012). De novo gene
disruptions in children on the autistic spectrum. Neuron 74, 285–299.
Shingaki, K., Katayama, T., and Tohyama, M. (2012). Transient expression of
Xpn, an XLMR protein related to neurite extension, during brain development
and participation in neurite outgrowth. Neuroscience 214, 181–191.
Johansson, M., Gillberg, C., and Ra ˚stam, M. (2010). Autism spectrum
conditions in individuals with Mo ¨bius sequence, CHARGE syndrome and
oculo-auriculo-vertebral spectrum: diagnostic aspects. Res. Dev. Disabil.
Jorde, L.B., Hasstedt, S.J., Ritvo, E.R., Mason-Brothers, A., Freeman, B.J.,
Pingree, C., McMahon, W.M., Petersen, B., Jenson, W.R., and Mo, A.
(1991). Complex segregation analysis of autism. Am. J. Hum. Genet. 49,
Kang, H.J., Kawasawa, Y.I., Cheng, F., Zhu, Y., Xu, X., Li, M., Sousa, A.M.,
Pletikos, M., Meyer, K.A., Sedmak,G., etal. (2011).Spatio-temporal transcrip-
tome of the human brain. Nature 478, 483–489.
Levy, D., Ronemus, M., Yamrom, B., Lee, Y.H., Leotta, A., Kendall, J.,
Marks, S., Lakshmi, B., Pai, D., Ye, K., et al. (2011). Rare de novo and trans-
mitted copy-number variation in autistic spectrum disorders. Neuron 70,
MacArthur, D.G., Balasubramanian, S., Frankish, A., Huang, N., Morris, J.,
Walter, K., Jostins, L., Habegger, L., Pickrell, J.K., Montgomery, S.B., et al.;
1000 Genomes Project Consortium (2012). A systematic survey of loss-of-
function variants in human protein-coding genes. Science 335, 823–828.
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky,
A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., and DePristo, M.A. (2010).
The Genome Analysis Toolkit: a MapReduce framework for analyzing next-
generation DNA sequencing data. Genome Res. 20, 1297–1303.
McLaren, W., Pritchard, B., Rios, D., Chen, Y., Flicek, P., and Cunningham, F.
(2010). Deriving the consequences of genomic variants with the Ensembl API
and SNP Effect Predictor. Bioinformatics 26, 2069–2070.
Morrow, E.M., Yoo, S.Y., Flavell, S.W., Kim, T.K., Lin, Y., Hill, R.S., Mukaddes,
N.M., Balkhy, S., Gascon, G., Hashmi, A., et al. (2008). Identifying autism loci
and genes by tracing recent shared ancestry. Science 321, 218–223.
Rare Complete Knockouts in Autism
Neuron 77, 235–242, January 23, 2013 ª2013 Elsevier Inc. 241
Nava, C., Lamari, F., He ´ron, D., Mignot, C., Rastetter, A., Keren, B., Cohen, D.,
Faudet, A., Bouteiller, D., Gilleron, M., et al. (2012). Analysis of the chromo-
some X exome in patients with autism spectrum disorders identified novel
candidate genes, including TMLHE. Transcult. Psychiatry 2, e179.
Neale, B.M., Kou, Y., Liu, L., Ma’ayan, A., Samocha, K.E., Sabo, A., Lin, C.F.,
Stevens, C., Wang, L.S., Makarov, V., et al. (2012). Patterns and rates of
exonic de novo mutations in autism spectrum disorders. Nature 485,
Newschaffer, C.J., Croen, L.A., Daniels, J., Giarelli, E., Grether, J.K., Levy,
S.E., Mandell, D.S., Miller, L.A., Pinto-Martin, J., Reaven, J., et al. (2007).
The epidemiology of autism spectrum disorders. Annu. Rev. Public Health
O’Roak, B.J., Vives, L., Girirajan, S., Karakoc, E., Krumm, N., Coe, B.P., Levy,
R., Ko, A., Lee, C., Smith, J.D., et al. (2012). Sporadic autism exomes reveal
a highly interconnected protein network of de novo mutations. Nature 485,
Pinto, D., Pagnamenta, A.T., Klei, L., Anney, R., Merico, D., Regan, R., Conroy,
J., Magalhaes, T.R., Correia, C., Abrahams, B.S., et al. (2010). Functional
impact of global rare copy number variation in autism spectrum disorders.
Nature 466, 368–372.
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A., Bender, D.,
Maller, J., Sklar, P., de Bakker, P.I., Daly, M.J., and Sham, P.C. (2007). PLINK:
a tool set for whole-genome association and population-based linkage
analyses. Am. J. Hum. Genet. 81, 559–575.
Renieri, A., Mari, F., Mencarelli, M.A., Scala, E., Ariani, F., Longo, I., Meloni, I.,
Cevenini, G., Pini, G., Hayek, G., and Zappella, M. (2009). Diagnostic criteria
for the Zappella variant of Rett syndrome (the preserved speech variant).
Brain Dev. 31, 208–216.
Ritvo, E.R., Spence, M.A., Freeman, B.J., Mason-Brothers, A., Mo, A., and
Marazita, M.L. (1985). Evidence for autosomal recessive inheritance in 46
families with multiple incidences of autism. Am. J. Psychiatry 142, 187–192.
M.P., Roeckel-Trevisiol, N., Jamali, S., Beclin, C., et al. (2006). SRPX2 muta-
tions in disorders of language cortex and cognition. Hum. Mol. Genet. 15,
Roll, P., Vernes, S.C., Bruneau, N., Cillario, J., Ponsole-Lenfant, M.,
Massacrier, A., Rudolf, G., Khalife, M., Hirsch, E., Fisher, S.E., and
Szepetowski, P. (2010). Molecular networks implicated in speech-related
disorders: FOXP2 regulates the SRPX2/uPAR complex. Hum. Mol. Genet.
Sanders, S.J., Ercan-Sencicek, A.G., Hus, V., Luo, R., Murtha, M.T., Moreno-
De-Luca, D., Chu, S.H., Moreau, M.P., Gupta, A.R., Thomson, S.A., et al.
(2011). Multiple recurrent de novo CNVs, including duplications of the
7q11.23 Williams syndrome region, are strongly associated with autism.
Neuron 70, 863–885.
Sanders, S.J., Murtha, M.T., Gupta, A.R., Murdoch, J.D., Raubeson, M.J.,
Willsey, A.J., Ercan-Sencicek, A.G., DiLullo, N.M., Parikshak, N.N., Stein,
J.L., et al. (2012). De novo mutations revealed by whole-exome sequencing
are strongly associated with autism. Nature 485, 237–241.
Sebat, J., Lakshmi, B., Malhotra, D., Troge, J., Lese-Martin, C., Walsh, T.,
Yamrom, B.,Yoon,S.,Krasnitz,A.,Kendall, J.,etal.(2007).Strong association
of de novo copy number mutations with autism. Science 316, 445–449.
Speevak, M.D., and Farrell, S.A. (2011). Non-syndromic language delay in
a child with disruption in the Protocadherin11X/Y gene pair. Am. J. Med.
Genet. B. Neuropsychiatr. Genet. 156B, 484–489.
Walsh, T., McClellan, J.M., McCarthy, S.E., Addington, A.M., Pierce, S.B.,
Cooper, G.M., Nord, A.S., Kusenda, M., Malhotra, D., Bhandari, A., et al.
(2008). Rare structural variants disrupt multiple genes in neurodevelopmental
pathways in schizophrenia. Science 320, 539–543.
Weiss, L.A., Shen, Y., Korn, J.M., Arking, D.E., Miller, D.T., Fossdal, R.,
Saemundsen, E., Stefansson, H., Ferreira, M.A., Green, T., et al.; Autism
Consortium (2008). Association between microdeletion and microduplication
at 16p11.2 and autism. N. Engl. J. Med. 358, 667–675.
Yan, D., and Liu, X.Z. (2010). Genetics and pathological mechanisms of Usher
syndrome. J. Hum. Genet. 55, 327–335.
Yu, T.W., Chahrour, M.H., Coulter, M.E., Jiralerspong, S., Okamura-Ikeda, K.,
Ataman, B., Schmitz-Abe, K., Harmin, D.A., Adli, M., Malik, A.N., et al. (2013).
Using whole-exome sequencing to identify inherited causes of autism. Neuron
77, this issue, 259–273.
Zhang, B., Kirov, S., and Snoddy, J. (2005). WebGestalt: an integrated system
for exploring gene sets in various biological contexts. Nucleic Acids Res.
33(Web Server issue), W741-8.
Zweier, C., de Jong, E.K., Zweier, M., Orrico, A., Ousager, L.B., Collins, A.L.,
Bijlsma, E.K., Oortveld, M.A., Ekici, A.B., Reis, A., et al. (2009). CNTNAP2
and NRXN1 are mutated in autosomal-recessive Pitt-Hopkins-like mental
retardation and determine the level of a common synaptic protein in
Drosophila. Am. J. Hum. Genet. 85, 655–666.
Rare Complete Knockouts in Autism
242 Neuron 77, 235–242, January 23, 2013 ª2013 Elsevier Inc.