Autism genome wide copy number variation reveals
ubiquitin and neuronal genes
Joseph T. Glessner1, Kai Wang1, Guiqing Cai13, Olena Korvatska15, Cecilia E. Kim1,
Shawn Wood12, Haitao Zhang1, Annette Estes15, Camille Brune7, Jonathan P. Bradfield1,
Marcin Imielinski1, Edward C. Frackelton1, Jennifer Reichert13, Emily L. Crawford11,
Jeffrey Munson15, Patrick Sleiman1, Rosetta Chiavacci1, Kiran Annaiah1, Kelly
Thomas1, Cuiping Hou1, Wendy Glaberson1, James Flory1, Frederick Otieno1, Maria
Garris1, Latha Soorya13, Lambertus Klei12, Joseph Piven8, Kacie J. Meyer8, Evdokia
Anagnostou13, Takeshi Sakurai13, Rachel M. Game11, Danielle S. Rudd8, Danielle
Zurawiecki13, Christopher McDougle9, Lea K. Davis8, Judith Miller9, David Posey10,
Shana Michaels12, Alexander Kolevzon13, Jeremy M. Silverman13, Raphael Bernier15,
Susan E. Levy2, Geraldine Dawson15, Thomas Owley7, William M. McMahon9, Thomas
H. Wassink8, John A. Sweeney7, John I. Nurnberger Jr.10, Hilary Coon9, James S.
Sutcliffe11, Nancy J. Minshew12, Struan F.A. Grant1,2, Maja Bucan5, Edwin H. Cook Jr.7,
Joseph D. Buxbaum13,14, Bernie Devlin12, Gerard D. Schellenberg5, Hakon Hakonarson1,2
1Center for Applied Genomics, Children’s Hospital of Philadelphia, Philadelphia, PA
2Department of Pediatrics, University of Pennsylvania School of Medicine, Philadelphia,
PA 19104 USA;
3University of Southern California, Los Angeles, CA 90089 USA;
4University of California Los Angeles, Los Angeles, CA 90095 USA;
5University of Pennsylvania School of Medicine, Philadelphia, PA 19104 USA;
6Neurodevelopmental Disorders Research Center and Department of Psychiatry,
University of North Carolina, Chapel Hill, NC 27412 USA;
7Institute for Juvenile Research and Department of Psychiatry, University of Illinois at
Chicago, Chicago, IL 60608 USA;
8University of Iowa, Iowa City, IA 52242 USA;
9University of Utah, Salt Lake City, UT 84112 USA;
10Indiana University, Indianapolis, IN 46202 USA;
11Center for Molecular Neuroscience and Vanderbilt Kennedy Center, Vanderbilt
University, Nashville, TN 37235 USA;
12Departments of Psychiatry and Human Genetics, University of Pittsburgh, Pittsburgh,
PA 15260 USA;
13Seaver Autism Center for Research and Treatment, Department of Psychiatry, Mount
Sinai School of Medicine, New York, NY 10029 USA;
14Departments of Neuroscience, and Genetics and Genomic Sciences, Mount Sinai School
of Medicine, New York, NY 10029 USA;
15University of Washington, Seattle, WA 98105 USA;
Autism spectrum disorders (ASDs) are childhood neurodevelopmental disorders
with complex genetic origins 1-4. Previous studies focusing on candidate genes or
genomic regions have identified several copy number variations (CNVs) that are
associated with increased risk of ASDs 5-9. In an attempt to comprehensively identify
CNVs conferring susceptibility to ASDs, we performed a whole-genome CNV study
on a cohort of 859 ASD cases and 1,409 healthy children of European ancestry who
were genotyped with ~550,000 SNP markers. Positive findings were evaluated in an
independent cohort of 1,336 ASD cases and 1,110 controls of European ancestry.
Besides previously reported ASD candidate genes, such as NRXN1 10 and CNTN4
11,12, multiple novel susceptibility genes encoding neuronal cell-adhesion molecules,
including NLGN1 and ASTN2, were enriched with CNVs in ASD cases compared to
controls (p=9.5x10-3). Furthermore, CNVs within or surrounding genes involved in
the ubiquitin pathways, including UBE3A, PARK2, RFWD2 and FBXO40, were
affected by CNVs not observed in controls (p=3.3x10-3). We also identified
duplications 55 kb upstream of AK123120 (p=3.6x10-6). Although these variants may
be individually rare, they target genes involved in neuronal cell-adhesion or
ubiquitin degradation, indicating that these two major gene networks expressed
within the central nervous system may contribute to the genetic susceptibility of
Autism spectrum disorders (ASDs), including autism, are neurodevelopmental
disorders characterized by impairments in social and communication skills, as well as
stereotyped and repetitive behaviors and/or a restricted range of interests. Current
prevalence estimates in the United States are 0.1-0.2% for autism and 0.6 % for ASDs 1,2.
Linkage and candidate gene association studies have implicated several
chromosomal regions in autism 3,4. However, positive findings in one study often fail to
replicate in other studies and a consistent picture of susceptibility loci in autism is still
lacking. Some telling clues about ASDs genetics arose from recent studies on copy
number variations (CNVs) 5, including association of de novo CNVs with ASDs 6. While
de novo CNVs that disrupt specific genes may contribute to the pathogenesis of ASDs,
heritable CNVs are much more common but have been less studied as risk factors of
ASDs. A family-based genome-wide linkage and CNV analysis by the Autism Genome
Project Consortium (AGP) using Affymetrix 10K SNP arrays implicated chromosome
11p12-13 and NRXN1 as candidate loci 7. A study using the Affymetrix 500K SNP array
in a Canadian population reported 277 rare CNVs that were only observed in ASD
patients but not in 1,652 healthy controls or in the Database of Genomic Variants 8.
Furthermore, 16p11.2 deletions and duplications have been reported in independent
cohorts of autism patients 9. These studies concordantly implicate a role of CNVs in the
genetic susceptibility to ASDs.
To search systematically for CNVs that confer risk to ASD, we employed a
genome-wide approach using the Illumina HumanHap550 BeadChip. We assembled an
Autism Case-Control (ACC) cohort by collecting 859 ASD cases (from a total of 1,246
ACC cases, parents, and siblings) of European ancestry affected with ASDs and 1,409
healthy controls. Among these case subjects, all met diagnostic criteria for autism based
on ADI (Autism Diagnostic Interview), and 124 met criteria for other ASDs based on
ADOS (Autism Diagnostic Observation Schedule) 13. 54% were from simplex families
with the balance coming from multiplex families. In addition, we also analyzed 1,336
cases (from a total of 3,398 cases, parents, and siblings) in the Autism Genetic Resource
Exchange (AGRE) 14 collection as well as 1,110 control subjects as a replication cohort.
Among the AGRE cases, 5% were from simplex families and 95% were from multiplex
families: 1,202 met criteria for autism and 134 for other ASDs 13 (Supplementary Tables
1 and 2).
We generated 78,490 CNV calls (22,581 in the ACC series and 55,909 in the
AGRE series) from all the ASD subjects and their family members that met strictly
established data quality thresholds (Methods). An average of 15.5 CNV calls were made
for each individual using the PennCNV software 15, with similar frequency observed in
cases and controls (Supplementary Figure 2).
We first examined eight genomic regions that have been previously implicated in
ASDs. Among those, CNVs involving the 15q11-13, 22q11.21, and NRXN1 regions have
well established associations with autism 10. CNVs affecting CNTN4 in ASD cases have
also been reported in independent studies 11-12. We statistically adjusted for relatedness
of cases with permutation and our results demonstrate that duplications of 15q11-q13 and
the 22q11.21 region, deletions of neurexin 1 (NRXN1), as well as deletions and
duplications of CNTN4 replicate in our cohorts (Table 1). On the other hand, we did not
obtain statistical support for several other genomic regions previously shown to associate
with ASD, including AUTS2 16, NLGN3 17, SHANK3 18 and 16p11.2 9 (Table 1). We
observed a similar frequency of deletions and duplications of the 16p11.2 locus in the
ASD cases (~0.3%) as previously reported 9; however, the CNV frequency in the control
subjects at this locus was also comparable to that of the cases (Supplementary Figure 3).
It is noteworthy that CNVs at the 16p11.2 locus do not segregate to all cases in three of
the affected families and they are also transmitted to unaffected siblings (Supplementary
Figure 4). These results suggest that CNVs at the 16p11.2 locus may not be sufficient to
be causal variants in ASD.
To identify additional novel genomic loci contributing to ASDs, we applied a
segment-based scoring approach that scans the genome for consecutive SNPs with more
frequent copy number changes in cases compared to controls. This approach defines copy
number variation regions, or CNVRs (Supplementary Figure 5, upper panel). In the ACC
cohort, we identified four CNVRs that were observed in cases but not in controls, as well
as five CNVRs that had significantly higher frequency in cases versus controls (Table 2).
To replicate the CNVRs exclusively observed in ACC cases, we examined the
AGRE case-control data set; of the four case-specific CNVRs, two were also exclusive to
AGRE cases (UBE3A and PARK2), while the other two (RFWD2 and FBXO40) were not
observed in either the cases or controls (combined P-values ranging from 3.57x10-6 to 0.1
unadjusted for multiple testing) (Table 2). Interestingly, these four genes that were
significantly enriched for CNVs and observed in the ASD cases only, belong to the
ubiquitin gene family (UniProt category “Ubl conjugation pathway”, P=3.3x10-3). The
other five CNVRs observed enriched in the ACC cases with a lower frequency in controls
were replicated in that the CNVRs were also found over-represented in the AGRE cases
in comparison with the independent controls (Table 2). Fig. 1 shows the most significant
locus, duplication 55 kb upstream of AK123120, using UCSC Genome Browser 19 with
Build 36 (March 2006) of the human genome. To ensure reliability of our CNV detection
method, we experimentally validated all the significant CNVRs using additional methods,
including quantitative PCR (QPCR) and multiplex ligation assay (MLPA) (Fig. 2).
Affymetrix 5.0 array data was also available for a subset of the AGRE subjects for
Besides segment-based scoring approach for CNV association, an alternative
method is the gene-based scoring approach that examines CNV calls impacting any
region of the gene (Supplementary Figure 5, lower panel). Using this approach, we
further identified seven genes with an increased frequency of CNVs in ASD cases versus
controls (Supplementary Table 3). For each gene, most CNVs target different parts of the
gene and would have been missed by the segment-based approach. Of note, four of the
genes identified by the segment- and gene-based approaches are involved in neuron
development (NRXN1, CNTN4, ASTN2 and NLGN1) (Gene Ontology term “neuron
development”, P=9.5x10-3). Therefore, by combining evidence from two complementary
CNV association approaches, the large sample size has enabled us to implicate two
specific gene networks or biological pathways in ASDs: the ubiquitination system and
neuronal cell-adhesion molecules.
The genes from the ubiquitin pathway (UBE3A, PARK2, RFWD2 and FBXO40)
represent a novel CNV finding in ASD susceptibility. Ubiquitination is a post-
translational modification which can rapidly alter protein function and target proteins for
proteasome-mediated degradation. The ubiquitin-proteasome system operates pre- and
post-synaptic compartments, regulating synaptic attributes, including neurotransmitter
release, synaptic vesicle recycling in presynaptic terminals, and dynamic changes in
dendritic spines and the post-synaptic density (PSD) 20. Of the four ubiquitin-related
genes highlighted in our study, UBE3A, an ubiquitin protein ligase, has been the most
extensively studied in the context of autism. PARK2 is an ubiquitin-protein ligase,
mutations of which cause autosomal recessive juvenile Parkinson Disease 21, and RFWD2
and FBXO40 are also ubiquitin-protein ligases, but neither has been previously associated
with disease-causing mutations. The role of ubiquitin in the turnover of synaptic
components such as the neuronal cell adhesion molecules in a process involving
regulation of activity-dependent synaptic plasticity presents a mechanism that links these
two major gene networks. In addition to the genes described above, a number of
ubiquitin-related genes are involved in human neurological diseases. These include
NHLRC1, UBR, CUL4B, BRWD3, and HUWE1, genes that encode ubiquitin protein E3
ligases. Mutations in the latter three and in UBE2A, an E2 ubiquitin conjugating enzyme,
cause syndromes that include intellectual disability 22.
Genes from the second group of genes implicated in our study, neuronal cell-
adhesion molecules, are critical in the development of the nervous system, contributing to
axonal guidance, synaptic formation and plasticity, and neuronal-glial interactions.
Recent genetic evidence has suggested associations between autism susceptibility and
neuronal cell adhesion molecules, including NRXN1 10, CNTNAP2 23, NLGN3 17, NLGN4
17, and specific cadherins. Our results provide support for some previously reported genes
(NRXN1 and CNTN4) and also implicate additional genes with cell-adhesion functions,
including NLGN1 and ASTN2. Mutations in NLGN1 and other neuroligin superfamily
members have previously been found in individuals with autism and have subsequently
been shown to be functionally relevant 24. ASTN1, a well-studied homologue of ASTN2, is
a neuronal protein receptor integral in the process of glial-guided granule cell migration
during development 25, and ASTN2 deletions have been recently associated with
In conclusion, using a genome-wide approach for high-resolution CNV detection,
we have identified candidate genomic loci with enrichment of CNVs in ASD cases as
compared to controls, and replicated many of them using an independent set of cases. A
majority of these genes fall within two pathways/networks involving neuronal cell-
adhesion and ubiquitin degradation. The enrichment of genes within these molecular
systems suggests novel susceptibility mechanisms for ASDs. Our results call for
functional and expression assays to be completed to assess the biological effects of CNVs
in these candidate genes.
All genome-wide SNP genotyping was performed using the InfiniumII HumanHap550
BeadChip at the Center for Applied Genomics at The Children’s Hospital of Philadelphia
(CHOP). We called CNVs with the PennCNV algorithm 15, which combines multiple
values, including Log R Ratio (LRR), B Allele Frequency (BAF), SNP spacing and
population frequency of the B allele into a hidden Markov model. The term “CNV”
represents individual CNV calls, while “CNVR” refers to population-level variation.
Quality control thresholds included a high success rate of attempted SNPs, low standard
deviation of normalized intensity, genetically inferred European ancestry, low genomic
wave artifacts, count of CNV calls per subject, and genotypic duplicate removal
(Supplementary Table 4). CNV frequency between cases and controls was evaluated at
each SNP using Fisher’s exact test. We considered loci significant between cases and
controls (p<0.05) where ACC discovery cases had overlapping variation, replicated in
AGRE or were not observed in control subjects, and validated with another method
(QPCR Roche Universal Probe Library, MRC-Holland MLPA, and Affymetrix 5.0 from
Broad). We report statistical local minimums in reference to a region of nominal
significance including SNPs residing within 1 Mb of each other (Supplementary Figure
6). Resulting significant CNVRs were excluded if: i) residing on telomere or centromere
proximal cytobands; ii) arising from a “peninsula” of common CNV (Supplementary
Figure 7); iii) genomic regions with extremes in GC content 27; or iv) samples
contributing to multiple CNVRs. To adjust for siblings in the AGRE data, we calculated a
permutation-based P-value (1000x), where disease labels for siblings were permutated
together. DAVID (Database for Annotation, Visualization, and Integrated Discovery) 28
assessed significance of functional annotation clustering. Correction of 5 deletion and 9
duplication CNVRs, based on discovery cohort (ACC) significance and signal review is
appropriate for our study (“CNV Filtering Steps” in Supplementary Materials).
1. Autism and Developmental Disabilities Monitoring Network.
Newschaffer, C.J. et al. The epidemiology of autism spectrum disorders. Annu.
Rev. Public Health 28, 235-58 (2007).
Gupta, A.R. & State, M.W. Recent advances in the genetics of autism. Biol.
Psychiatry 61, 429-37 (2007).
Klauck, S.M. Genetics of autism spectrum disorder. Eur. J. Hum. Genet. 14, 714-
Vorstman, J.A. et al. Identification of novel autism candidate regions through
analysis of reported cytogenetic abnormalities associated with autism. Mol.
Psychiatry 11, 1, 18-28 (2006).
Sebat, J. et al. Strong association of de novo copy number mutations with autism.
Science 316, 445-9 (2007).
Szatmari, P. et al. Mapping autism risk loci using genetic linkage and
chromosomal rearrangements. Nat. Genet. 39, 319-28 (2007).
Marshall, C.R. et al. Structural variation of chromosomes in autism spectrum
disorder. Am. J. Hum. Genet. 82, 477-88 (2008).
Weiss, L.A. et al. Association between Microdeletion and Microduplication at
16p11.2 and Autism. N. Engl. J. Med. 358(7), 667-675 (2008).
Kim, H.G. et al. Disruption of neurexin 1 associated with autism spectrum
disorder. Am. J. Hum. Genet. 82, 199-207 (2008).
Roohi J. et al. Disruption of Contactin 4 in 3 Subjects with Autism Spectrum
Disorder. J. Med. Genet. (2008).
Fernandez, T. et al. Disruption of Contactin 4 (CNTN4) results in developmental
delay and other features of 3p deletion syndrome. Am. J. Hum. Genet. 82, 1385
Le Couteur, A. et. al. Diagnosing Autism Spectrum Disorders in Pre-school
Children Using Two Standardised Assessment Instruments: The ADI-R and the
ADOS. J.Autism Dev. Disord. 38, 362–372 (2008).
Geschwind, D.H. et al. The autism genetic resource exchange: a resource for the
study of autism and related neuropsychiatric conditions. Am. J. Hum. Genet. 69,
15. Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-
resolution copy number variation detection in whole-genome SNP genotyping
data. Genome Res. 17, 1665-1674 (2007).
Kalscheuer, V.M. et al. Mutations in autism susceptibility candidate 2 (AUTS2)
in patients with mental retardation. Hum. Genet. 121, 501-9 (2007).
Jamain, S. et al. Mutations of the X-linked genes encoding neuroligins NLGN3
and NLGN4 are associated with autism. Nat. Genet. 34, 27-9 (2003).
Moessner, R. et al. Contribution of SHANK3 mutations to autism spectrum
disorder. Am. J. Hum. Genet. 81, 1289-97 (2007).
Kent W.J et al. The human genome browser at UCSC. Genome Res. 12(6), 996-
Yi, J.J. & Ehlers, M.D. Ubiquitin and protein turnover in synapse function.
Neuron 47, 629-32 (2005).
Kitada, T. et al. Mutations in the parkin gene cause autosomal recessive juvenile
parkinsonism. Nature 392, 605-608, (1998).
Tai, H. & Schuman, E. Ubiquitin, the proteasome and protein degradation in
neuronal function and dysfunction. Nature Reviews Neuroscience 9(11), 826-838
Alarcón M . et. al. Linkage, Association, and Gene-Expression Analyses Identify
CNTNAP2 as an Autism-Susceptibility Gene. The American Journal of Human
Genetics 82. 150-159 (2008).
Chubykin, A.A. et al. Dissection of synapse induction by neuroligins: effect of a
neuroligin mutation associated with autism. J. Biol. Chem. 280, 22365-74 (2005).
Zheng, C., Heintz, N. & Hatten, M.E. CNS gene encoding astrotactin, which
supports neuronal migration along glial fibers. Science 272, 417-9 (1996).
Kahler, A.K. et al. Association analysis of schizophrenia on 18 genes involved in
neuronal migration: MDGA1 as a new susceptibility gene. Am. J. Med. Genet. B.
Neuropsychiatr. Genet. 147B, 1089-100 (2008).
Diskin, S. et al. Adjustment of genomic waves in signal intensities from whole-
genome SNP genotyping platforms. Nucleic Acids Research. 36(19) (2008).
G Dennis Jr et al. DAVID: Database for Annotation, Visualization, and Integrated
Discovery. Genome Biology. 4(9), (2003).
Hellemans, J., Mortier, G., De Paepe, A., Speleman, F. & Vandesompele, J.
qBase relative quantification framework and software for management and
automated analysis of real-time quantitative PCR data. Genome Biol. 8, R19
Supplementary Information accompanies the paper on www.nature.com/nature.
A figure summarizing the main result of this paper is also included as SI.
We gratefully thank all the ASD children and their families at the participating study sites who were
enrolled in this study and all the control subjects who donated blood samples to Children’s Hospital of
Philadelphia (CHOP) for genetic research purposes. We thank the technical staff in the Center for Applied
Genomics, Children’s Hospital of Philadelphia for generating the genotypes used in this study. We thank
Sharon Diskin for her contribution to the discussion on the effect of wave artifacts on CNV calling. We
also thank S. Kristinsson, L. A. Hermannsson and A. Krisbjörnsson for their software design and
contribution. This research was financially supported by The Children’s Hospital of Philadelphia, Autism
Speaks, and NICHD (HD35476).
We also gratefully acknowledge the resources provided by the Autism Genetic Resource Exchange
(AGRE) Consortium* and the participating families. AGRE is a program of Autism Speaks and is
supported, in part, currently by grant 1U24MH081810 from the National Institute of Mental Health to Clara
M. Lajonchere (PI), and formerly by Grant MH64547 to Daniel H. Geschwind (PI). The AGRE data set
was genotyped by the Center for Applied Genomics at the Children’s Hospital of Philadelphia, and the
complete sets of genotyping data have been released to the public domain. AGRE-approved academic
researchers can acquire the data sets from AGRE (http://www.agre.org).
The study is supported in part by Research Award from the Margaret Q. Landenberger Foundation (H.H.);
a Research Development Award from the Cotswold Foundation (H.H. & S.F.A.G); the Beatrice and
Stanley A. Seaver Foundation (JDB), the Department of Veterans Affairs (GDS); NIH grants HD055782-
01 (JM, AE, OK, GD, and GDS), MH0666730 (JDB), MH061009 and NS049261 (JSS), HD055751
(EHC); MH69359, M01-RR00064, and the Utah Autism Foundation (HC, JM, and WMM); Genotyping:
All genotyping for this study was supported by an Institutional Development Award to the Center for
Applied Genomics (HH) from the Children’s Hospital of Philadelphia.
We acknowledge the Autism Genome Project Consortium (JP, CWB, THW, WMM, HC, JIN, JSS, EHC,
JM, AE, OK, JDB, BD and GDS) funded by Autism Speaks, the Medical Research Council (UK) and the
Health Research Board (Ireland).
We thank the “Autism Consortium” for contributing Affymetrix 5.0 data to AGRE for validation of CNV
calls made from our discovery Illumina CNV calls. We thank the contributing institutions: The Broad
Institute, Massachusetts Institute of Technology, Harvard University, Harvard Medical School, Children’s
Hospital Boston, Center for Human Genetic Research at Massachusetts General Hospital, Cambridge
Health Alliance, Boston University School of Medicine, Boston University, Boston Medical Center, Beth
Israel Deaconess Medical Center, Massachusetts General Hospital/Ladders, McLean Hospital, and The
Floating Hospital for Children at Tufts Medical Center. We thank the Board of Directors: Peter Barrett,
Alan Crane, Janet Atkins, John Graham, Paul Marcus, Edward Scolnick, MD, James F. Gusella, PhD,
Mriganka Sur, PhD, Tish Tanski, MSW, and Christopher Walsh, MD, PhD.
*The AGRE Consortium:
Dan Geschwind, M.D., Ph.D., UCLA, Los Angeles, CA;
Maja Bucan, Ph.D., University of Pennsylvania, Philadelphia, PA;
W.Ted Brown, M.D., Ph.D., F.A.C.M.G., N.Y.S. Institute for Basic Research in Developmental
Disabilities, Staten Island, NY;
Joseph Buxbaum, Ph.D., Mt. Sinai School of Medicine, NY, NY;
Rita M. Cantor, Ph.D., UCLA School of Medicine, Los Angeles, CA;
John N. Constantino, M.D., Washington University School of Medicine, St. Louis, MO;
T.Conrad Gilliam, Ph.D., University of Chicago, Chicago, IL;
Clara Lajonchere, Ph.D, Cure Autism Now, Los Angeles, CA;
David H. Ledbetter, Ph.D., Emory University, Atlanta, GA;
Christa Lese-Martin, Ph.D., Emory University, Atlanta, GA;
Janet Miller, J.D., Ph.D., Cure Autism Now, Los Angeles, CA;
Stanley F. Nelson, M.D., UCLA School of Medicine, Los Angeles, CA;
Gerard D. Schellenberg, Ph.D., University of Washington, Seattle, WA;
Carol A. Samango-Sprouse, Ed.D., George Washington University, Washington, D.C.;
Sarah Spence, M.D., Ph.D., UCLA, Los Angeles, CA;
Matthew State, M.D., Ph.D., Yale University , New Haven, CT.
Rudolph E. Tanzi, Ph.D., Massachusetts General Hospital, Boston, MA..
H.H. and G.S. designed the study and supervised the data analysis and interpretation. J.T.G., K.W. and
B.D. conducted the statistical analyses. C.E.K, T.C., E.C.F. and R.S. directed the genotyping of stage 1.
J.D.B. coordinated the validation. G.C. and O.K. preformed QPCR and MLPA validation of CNVs and
edited the manuscript. J.T.G., K.W., H.H., G.D.S. and B.D. drafted the manuscript. G.D.S., N.M.,
E.C.,W.M., H.C., T.W., J.D.B., T.O., J.N., E.A., L.S., J.R., T.S., C.B. and D.Z. collected samples and
contributed phenotype data for the study and assisted with data collection and manuscript preparation.
Other authors contributed to sample acquisition and processing. E.C., S.F.A.G., P.S., M.I., B.D., L.K.,
S.W., K.W. reviewed the data, assisted data interpretation of the data, and edited the manuscript.
Reprints and permissions information is available at www.nature.com/reprints. The authors declare no
competing financial interests. Correspondence and requests for materials should be addressed to H.H.
(email@example.com) or G.D.S (firstname.lastname@example.org).
Table 1. CNVs in gene regions previously implicated in ASDs.
Number in parenthesis refers to count of unrelated siblings or distinct unrelated families for all tables 1-
2. The sample included 859 ASD cases from the ACC cohort, 1336 ASD cases from the AGRE cohort,
and 2519 unaffected controls (1409 ACC discovery controls and 1110 AGRE replication controls). All
CNVs, except 16p11.2, AUTS2, NLGN3, and SHANK3 were experimentally validated in the ACC
cohort. QPCR: quantitative polymerase chain reaction; MLPA: multiplex ligation-dependent probe
amplification; BeadStudio: a visualization software from Illumina to examine signal intensity patterns
(Supplementary Figure 8). Regions listed represent the optimal overlap of cases and significance with
respect to controls as described in the Methods and Supplementary Fig. 5 upper panel: SNP-based
segment approach. “Inh” column lists the inheritance pattern of each CNV from parents to cases in the
format <inherited from mother>:<inherited from father>:<denovo>. Pedigrees provided in
Supplementary Figure 9. Note that parents were not available for all cases. The percentage of
Inheritance is listed below these three values. Note that parents were not available for all cases.
Significance Type Validation
8 0 7(5) 0
4 0 6(4) 0
3 0 7(4) 0 4.7E-4 0.004
5 0 4 0
1 1 8(6) 0
2 3 7(3) 1
3 2 5(4) 2
0 0 1 0 0.466 0.425
0 0 1 1
2 1 0 1 1 1
Table 2. Novel common CNVRs over-represented ASD patients
0.005 chr6: 162584576-
Dup3 0 3 0 NA 0.0102
Dup4 0 0 0 NA 0.0469 0.034
Dup 3 0 0 0 NA 0.10090.094
* AGRE family 574 has 3 affected siblings ^ 3 families had 3 affected siblings (AGRE families 656, 955,
Dup 9 3 24(20) 4 16:8:1
5.547 3.57E-6 < 0.001
Del 3 1 12(8*) 2 5.804 0.0017
Del 4 1 10(8) 2 5.412 0.0031
Dup40 52 74(57^) 40 1.471 0.0101
Dup 3 0 7(5) 2 5.782 0.0166
and 1559). Based on 859 ASD cases from the ACC cohort, 1336 ASD cases from the AGRE cohort
and 2519 unaffected controls (1409 ACC discovery controls and 1110 AGRE replication controls). All
loci validated with QPCR. “Inh” column lists the inheritance pattern of each CNV from parents to cases
in the format <inherited from mother>:<inherited from father>:<denovo>. The percentage of Inheritance
is listed below these three values. A p value in italic those CNVs that survive multiple testing with
Bonferroni adjustment in the discovery phase (i.e., P<0.05 following correction for 5 tests in case of
deletion and 9 in case of duplications), and CNVRs that survived both the replication and experimental
validation criteria are listed in bold.
Figure 1. AK123120: Example of overrepresented CNVs
AK123120 chr2:12,986,750-13,291,000 divided into subsections with headers for ACC CNVs,
AGRE CNVs, AGRE Affymetrix validation CNVs, and Control CNVs. The AGRE Affymetrix
Replication track is based on Genome-wide 5.0 SNP genotyping data from the Broad Institute
(see supplementary methods and acknowledgements), and were generated using the PennCNV-
Affy algorithm (see supplementary methods), to serve as an additional means to validate the
Illumina-based CNV calls. SNP/CN probe coverage is shown as blue lines across the top.
Produced with custom tracks uploaded to http://genome.ucsc.edu. Figures for all loci are included
in Supplementary Information: UCSC Views of Raw CNV calls Representing Significant Loci
Figure 2. Independent Validation using QPCR and MLPA
Fluorescent probe-based QPCR assays using Roche Universal probe library and/or MLPA were
designed to validate every candidate CNV with a completely independent test (representative
series shown for each locus). Error bars are calculated based on the standard deviation of
Page 19 Download full-text