Rare De Novo Variants Associated with Autism
Implicate a Large Functional Network of Genes
Involved in Formation and Function of Synapses
Sarah R. Gilman,1,3Ivan Iossifov,2,3,* Dan Levy,2Michael Ronemus,2Michael Wigler,2and Dennis Vitkup1,*
1Center forComputational Biology andBioinformatics and Department of BiomedicalInformatics,Columbia University,1130 St.NicolasAve,
New York, NY 10032, USA
2Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
3These authors contributed equally to this work
*Correspondence: email@example.com (I.I.), firstname.lastname@example.org (D.V.)
Identification of complex molecular networks under-
lying common human phenotypes is a major chal-
lenge of modern genetics. In this study, we develop
a method for network-based analysis of genetic
associations (NETBAG). We use NETBAG to identify
a large biological network of genes affected by rare
de novo CNVs in autism. The genes forming the
network are primarily related to synapse develop-
ment, axon targeting, and neuron motility. The identi-
fied network is strongly related to genes previously
esis that significantly stronger functional perturba-
tions are required to trigger the autistic phenotype
analysis of de novo variants supports the hypothesis
that perturbed synaptogenesis is at the heart of
autism. More generally, our study provides proof of
the principle that networks underlying complex
human phenotypes can be identified by a network-
based functional analysis of rare genetic variants.
The ongoing revolution in genomic and sequencing technologies
has allowed researchers to routinely perform genome-wide
association studies (GWAS) for multiple common human
diseases and phenotypes (Frazer et al., 2007; Hardy and
Singleton, 2009). Although these studies have successfully iden-
tified hundreds of significant associations, common polymor-
phisms reaching genome-wide significance usually explain
a relatively small fraction of disease heritability (Goldstein,
2009). There is a growing consensus in genetics that the most
of functional pathways underlying the observed phenotypes
(Hirschhorn, 2009). In addition, it is likely that a significant frac-
tion of so-called missing heritability (Manolio et al., 2009), which
has eluded association studies, is accounted for by rare single
nucleotide mutations and structural genomic variations (McClel-
lan and King, 2010).
A notable example of a disease with a very complex allelic
architecture is autism—one of the most common neurological
disorders (Geschwind, 2008). Autism spectrum disorders are
characterized by impaired social interactions, abnormal verbal
communication, restricted interests, and repetitive behaviors.
Due in part to better detection strategies, the combined preva-
lence of ASD has been steadily increasing for several decades
Although autism has a very strong genetic component, with an
gotic twins (Hyman, 2008), GWAS-based searches have impli-
phisms reaching genome-wide significance (Wang et al., 2009;
Weiss etal., 2009). Inaddition,the agreement between published
determinants for this disease still remain largely unknown. Impor-
de novo copy number variations (CNVs) (Marshall et al., 2008;
Moessneret al., 2007;Pinto et al., 2010;Sebat et al., 2007) signif-
icantly contribute to autism etiology (Zhao et al., 2007).
The main challenge in the analysis of rare genetic variations,
such as de novo CNVs, is precisely their rarity, i.e., the fact
that a vast majority of the observed genetic events are unique.
Consequently, each rare variant by itself is not statistically signif-
icant, so an integrative conceptual framework is required to
understand their overall functional impact. We hypothesized
that recently obtained genome-wide de novo CNV data (Levy
et al., 2011) could allow identification of the underlying biological
pathways and processes if considered in the context of func-
tional biological networks (Feldman et al., 2008; Iossifov et al.,
2008). Here, we develop a method for network-based analysis
of genetic associations (NETBAG) and demonstrate its utility in
autism. The presented approach can determine whether the
observed rare events en masse affect a significantly intercon-
nected functional network of human genes.
NETBAG Method Overview
To implement our approach, we first built a background network
that connects any pair of human genes with a weighted edge
898 Neuron 70, 898–907, June 9, 2011 ª2011 Elsevier Inc.
encapsulating our a priori expectation that the two genes partic-
ipate in the same genetic phenotype (see Experimental Proce-
dures and Supplemental Experimental Procedures). This back-
ground network was based on a combination of various
functional descriptors,suchasshared geneontology (GO)anno-
tations (Ashburner et al., 2000), functional pathways in KEGG
(Kanehisa and Goto, 2000), shared interaction partners and
coevolutionary patterns (see Experimental Procedures). Similar
methods have been previously used to build functional networks
in humans and several model organisms (Lee et al., 2004, 2008).
In contrast to the aforementioned studies, edges in our network
represent the likelihood that two genes participate in a similar
genetic phenotype rather than necessarily share cellular func-
tions. Importantly, no deliberate biases toward genes previously
implicated in autism or biological functions related to nervous
system were used in building the network. The likelihood
network was assembled using a large set of known disease-
gene associations that were carefully curated for our previous
study (Feldman et al., 2008). This set contains 476 genes asso-
ciated with 132 different genetic diseases (see Experimental
Using the constructed network, we searched for functionally
connected clusters of human genes affected by de novo CNVs
(Figure 1). The genes within the observed CNV regions were first
mapped to the nodes corresponding to these genes in the
network (Figure 1B). Clusters of genes were assigned scores
based on the strength of their connections, and a greedy search
algorithm (see Experimental Procedures) was then used to find
high-scoring clusters of genes within the CNV regions (Fig-
ure 1C). In this search procedure genes from any CNV region
could be selected to be members of the growing cluster (Fig-
ure 1C), but to prevent large CNV regions from dominating clus-
ters, we allowed no more than one or two genes from a given
CNV to participate in a cluster (Figures 2A and 2B, respectively).
Figure 1. Outline of the NETwork-Based Analysis of Genetic Associations (NETBAG), the Method Used in Our Study to Identify Significant
and Functionally Related Gene Networks Affected by De Novo CNV Events
(B) One or two genes are selected from each of de novo CNV region to form a cluster. The genes are mapped to the likelihood network and a combined score is
calculated for each cluster based on interactions between its genes.
(C) A greedy search procedure is used to identify the cluster with maximal score.
(D) The significance of the cluster with maximum score is determined by comparing it to the distribution of maximal scores from randomly selected genomic
regions with similar gene counts.
See Figure S1 for a further description of the NETBAG approach.
Rare CNV Variants in Autism Perturb Synaptogenesis
Neuron 70, 898–907, June 9, 2011 ª2011 Elsevier Inc. 899
Tavazoie, S.F., Alvarez, V.A., Ridenour, D.A., Kwiatkowski, D.J., and Sabatini,
B.L. (2005). Regulation of neuronal morphology and function by the tumor
suppressors Tsc1 and Tsc2. Nat. Neurosci. 8, 1727–1734.
Tessier-Lavigne, M., and Goodman, C.S. (1996). The molecular biology of
axon guidance. Science 274, 1123–1133.
UniProt Consortium. (2007). The Universal Protein Resource (UniProt). Nucleic
Acids Res. 35 (Database issue), D193–D197.
Wang, K., Zhang, H., Ma, D., Bucan, M., Glessner, J.T., Abrahams, B.S.,
Salyakina, D., Imielinski, M., Bradfield, J.P., Sleiman, P.M., et al. (2009).
Common genetic variants on 5p14.1 associate with autism spectrum disor-
ders. Nature 459, 528–533.
Wang, K., Bucan, M., Grant, S.F., Schellenberg, G., and Hakonarson, H.
(2010). Strategies for genetic studies of complex diseases. Cell 142,
351–353, author reply 353–355.
Weiss, L.A., Arking, D.E., Daly, M.J., and Chakravarti, A.; Gene Discovery
Project of Johns Hopkins & the Autism Consortium. (2009). A genome-wide
linkage and association scan reveals novel loci for autism. Nature 461,
Wood, L.D., Parsons, D.W., Jones, S., Lin, J., Sjo ¨blom, T., Leary, R.J., Shen,
D., Boca, S.M., Barber, T., Ptak, J., et al. (2007). The genomic landscapes of
human breast and colorectal cancers. Science 318, 1108–1113.
Woolfrey, K.M., Srivastava, D.P., Photowala, H., Yamashita, M., Barbolina,
M.V., Cahill, M.E., Xie, Z., Jones, K.A., Quilliam, L.A., Prakriya, M., and
Penzes, P. (2009). Epac2 induces synapse remodeling and depression and
its disease-associated forms alter spines. Nat. Neurosci. 12, 1275–1284.
associated with lifelong memories. Nature 462, 920–924.
Zhao, X., Leotta, A., Kustanovich, V., Lajonchere, C., Geschwind, D.H., Law,
K., Law, P., Qiu, S., Lord, C., Sebat, J., et al. (2007). A unified genetic theory
for sporadic and inherited autism. Proc. Natl. Acad. Sci. USA 104, 12831–
Zoghbi, H.Y. (2003). Postnatal neurodevelopmental disorders: meeting at the
synapse? Science 302, 826–830.
Rare CNV Variants in Autism Perturb Synaptogenesis
Neuron 70, 898–907, June 9, 2011 ª2011 Elsevier Inc. 907