DNA targeting specificity of RNA-guided CAS9 nucleases

Article (PDF Available)inNature Biotechnology 31(9) · July 2013with 1,247 Reads
DOI: 10.1038/nbt.2647 · Source: PubMed
Abstract
The Streptococcus pyogenes Cas9 (SpCas9) nuclease can be efficiently targeted to genomic loci by means of single-guide RNAs (sgRNAs) to enable genome editing. Here, we characterize SpCas9 targeting specificity in human cells to inform the selection of target sites and avoid off-target effects. Our study evaluates >700 guide RNA variants and SpCas9-induced indel mutation levels at >100 predicted genomic off-target loci in 293T and 293FT cells. We find that SpCas9 tolerates mismatches between guide RNA and target DNA at different positions in a sequence-dependent manner, sensitive to the number, position and distribution of mismatches. We also show that SpCas9-mediated cleavage is unaffected by DNA methylation and that the dosage of SpCas9 and sgRNA can be titrated to minimize off-target modification. To facilitate mammalian genome engineering applications, we provide a web-based software tool to guide the selection and validation of target sequences as well as off-target analyses.
nature biotechnology VOLUME 31 NUMBER 9 SEPTEMBER 2013 8 2 7
L E T T E R S
The Streptococcus pyogenes Cas9 (SpCas9) nuclease can
be efficiently targeted to genomic loci by means of single-
guide RNAs (sgRNAs) to enable genome editing
1–10
. Here, we
characterize SpCas9 targeting specificity in human cells to
inform the selection of target sites and avoid off-target effects.
Our study evaluates >700 guide RNA variants and SpCas9-
induced indel mutation levels at >100 predicted genomic
off-target loci in 293T and 293FT cells. We find that SpCas9
tolerates mismatches between guide RNA and target DNA at
different positions in a sequence-dependent manner, sensitive
to the number, position and distribution of mismatches.
We also show that SpCas9-mediated cleavage is unaffected by
DNA methylation and that the dosage of SpCas9 and sgRNA
can be titrated to minimize off-target modification. To facilitate
mammalian genome engineering applications, we provide a
web-based software tool to guide the selection and validation of
target sequences as well as off-target analyses.
The bacterial type II clustered, regularly interspaced, short
palindromic repeats (CRISPR) system from S. pyogenes can be recon-
stituted in mammalian cells using three minimal components
1
: the
CRISPR-associated nuclease Cas9 (SpCas9), a specificity-determining
CRISPR RNA (crRNA), and an auxiliary trans-activating crRNA
(tracrRNA)
11
. Following crRNA and tracrRNA hybridization, SpCas9
is targeted to genomic loci matching a 20-nt guide sequence within
the crRNA, immediately upstream of a required 5-NGG protospacer
adjacent motif (PAM)
11
. crRNA and tracrRNA duplexes can also
be fused to generate a chimeric sgRNA
12
that mimics the natural
crRNA-tracrRNA hybrid. Both crRNA-tracrRNA duplexes and
sgRNAs can be used to target SpCas9 for multiplexed genome edit-
ing in eukaryotic cells
1,3
.
Although an sgRNA design consisting of a truncated crRNA and
tracrRNA had been previously shown to mediate efficient cleavage
in vitro
12
, it failed to achieve detectable cleavage at several loci that
were efficiently modified by crRNA-tracrRNA duplexes bearing
identical guide sequences
1
. Because the major difference between this
sgRNA design and the native crRNA-tracrRNA duplex is the length of
the tracrRNA sequence, we tested whether extension of the tracrRNA
tail would improve SpCas9 activity.
We generated a set of sgRNAs targeting multiple sites within the
human EMX1 and PVALB loci with different tracrRNA 3 truncations
(Fig. 1a). Using the SURVEYOR nuclease assay
13
, we assessed the ability
of each Cas9-sgRNA complex to generate indels in human embryonic
kidney (HEK) 293FT cells through the induction of DNA double-
stranded breaks (DSBs) and subsequent nonhomologous end joining
(NHEJ) DNA damage repair (Online Methods). sgRNAs with +67 or +85
nucleotide (nt) tracrRNA tails mediated DNA cleavage at all target sites
tested, with up to fivefold higher levels of indels than the correspond-
ing crRNA-tracrRNA duplexes (Fig. 1b and Supplementary Fig. 1a).
Furthermore, both sgRNA designs efficiently modified PVALB loci
that were previously not targetable using crRNA-tracrRNA duplexes
1
(Fig. 1b and Supplementary Fig. 1b). For all five tested targets, we
observed a consistent increase in modification efficiency with increas-
ing tracrRNA length. We performed northern blot analyses for the
guide RNA truncations and found increased levels of expression for
the longer tracrRNA sequences, suggesting that improved target cleav-
age was at least partially due to higher sgRNA expression or stability
(Fig. 1c). Taken together, these data indicate that the tracrRNA tail is
important for optimal SpCas9 expression and activity in vivo.
We further investigated the sgRNA architecture by extending
the duplex length from 12 to the 22 nt found in the native crRNA-
tracrRNA duplex (Supplementary Fig. 2a). We also mutated the
sequence encoding the sgRNAs to abolish any poly-T tracts that
could serve as premature transcriptional terminators for U6-driven
transcription
14
. We tested these new sgRNA scaffolds on three tar-
gets within the human EMX1 gene (Supplementary Fig. 2b) and
observed only modest changes in modification efficiency. Thus, we
established sgRNA(+67) as a minimum effective SpCas9 guide RNA
architecture and for all subsequent studies we used the most active
sgRNA(+85) architecture.
DNA targeting specificity of RNA-guided Cas9
nucleases
Patrick D Hsu
1–3,9
, David A Scott
1,2,9
, Joshua A Weinstein
1,2
, F Ann Ran
1–3
, Silvana Konermann
1,2
,
Vineeta Agarwala
1,4,5
, Yinqing Li
1,2
, Eli J Fine
6
, Xuebing Wu
7
, Ophir Shalem
1,2
, Thomas J Cradick
6
,
Luciano A Marraffini
8
, Gang Bao
6
& Feng Zhang
1,2
1
Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.
2
McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences,
Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
3
Department of Molecular and Cellular Biology,
Harvard University, Cambridge, Massachusetts, USA.
4
Program in Biophysics, Harvard University, Cambridge, Massachusetts, USA.
5
Harvard-MIT Division of Health
Sciences and Technology, MIT, Cambridge, MA 02139, USA.
6
Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta,
Georgia, USA.
7
Computational and Systems Biology Graduate Program, Koch Institute for Integrative Cancer Research, Massachusetts Institute of Technology,
Cambridge, Massachusetts, USA.
8
Laboratory of Bacteriology, The Rockefeller University, New York, New York, USA.
9
These authors contributed equally to this work.
Correspondence should be addressed to F.Z. (zhang@broadinstitute.org).
Received 30 March; accepted 30 June; published online 21 July 2013; corrected online 28 August 2013; doi:10.1038/nbt.2647
npg
© 2013 Nature America, Inc. All rights reserved.
8 2 8 VOLUME 31 NUMBER 9 SEPTEMBER 2013 nature biotechnology
L E T T E R S
We have previously shown that a catalytic mutant of SpCas9 (D10A
nickase) can mediate gene editing by homology-directed repair with-
out detectable indel formation
1
. Given its higher cleavage efficiency,
we tested whether sgRNA(+85), in complex with the Cas9 nickase,
can likewise facilitate homology-directed repair without incurring on-
target NHEJ. Using single-stranded oligonucleotides as repair tem-
plates, we observed that both the wild-type and the D10A SpCas9
mediate homology-directed repair in HEK 293FT cells, whereas
only the former does so in human embryonic stem cells (hESCs;
Fig. 1d and Supplementary Fig. 3ac). We further confirmed using
SURVEYOR assay that no target indel mutations are induced by the
SpCas9 D10A nickase (Supplementary Fig. 3d).
To explore whether the genome targeting ability of sgRNA(+85)
is influenced by epigenetic factors
15,16
that constrain the alternative
transcription activator-like effector nuclease (TALENs)
17–21
and
potentially also zinc finger nuclease (ZFNs)
22–26
technologies, we
further tested the ability of SpCas9 to cleave methylated DNA. Using
either unmethylated or M. SssI-methylated pUC19 as DNA targets
(Supplementary Fig. 4a,b) in a cell-free cleavage assay, we showed
that SpCas9 efficiently cleaves pUC19 regardless of CpG methylation
status in either the 20-bp target sequence or the PAM (Supplementary
Fig. 4c). To test whether this is also true in vivo, we designed sgRNAs
to target a highly methylated region of the human SERPINB5 locus
(Fig. 1e,f). All three sgRNAs tested were able to mediate indel muta-
tions in endogenously methylated targets (Fig. 1g).
Having established the optimal guide RNA architecture for SpCas9
and having demonstrated its insensitivity to genomic CpG methyla-
tion, we sought to conduct a comprehensive characterization of the
DNA targeting specificity of SpCas9. Previous studies on SpCas9 cleav-
age specificity
1,2,12
were limited to a small set of single-nucleotide
mismatches between the guide sequence and DNA target, suggest-
ing that perfect base-pairing within 10–12 bp directly 5 of the PAM
(PAM-proximal) determines Cas9 specificity, whereas multiple PAM-
distal mismatches can be tolerated. In addition, a recent study using
catalytically inactive SpCas9 as a transcriptional repressor found no
significant off-target effects throughout the Escherichia coli transcrip-
tome
27
. However, a systematic analysis of Cas9 specificity within the
context of a larger mammalian genome has not yet been reported.
To address this, we first evaluated the effect of imperfect comple-
mentarity between the guide RNA and its genomic target on SpCas9
activity, and then assessed the cleavage activity resulting from a single
sgRNA on multiple genomic off-target loci with sequence similarity.
To facilitate large-scale testing of mismatched guide sequences, we
developed a simple sgRNA testing assay by generating expression
f g
–135
–103
+9
+12
+17
+22
+27
+32
+55
+60
+82
+101
+105
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
sgRNA 1
sgRNA 2
sgRNA 3
0
2
4
6
8
10
Indel (%)
Clone number
Distance from TSS
d
HEK 293FT
ssODN:
Cas9: WT D10A WT D10A
HR (%):
Sense Antisense
15 1.1 20 1.0
WT D10A WT D10A
Sense Antisense
hESC
1.6 6.1
e
Guide sequence (20 bp)
sgRNA architecture
U6 CBh hSpCas9NLS bGH pA
sgRNA
a
24.6 4.9 32.4 51.9
crRNA
+48
+54
+67
+85
sgRNA
Indel (%):
b
Target 1 (EMX1)
sgRNA
18.3 49.9
crRNA
+48
+54
+67
+85
Target 4 (PVALB)
c
+48 +54 +67 +85
U1
100 bp
90 bp
80 bp
70 bp
sgRNA truncations
A
A
+54 +48
+67
+85
5
-NNNNNNNNNNNNNNNNNNNNGUUUUAGAGCUAG
•|||||• ||||
GUUCAACUAUUGCCUGAUCGGAAUAAAAUU CGAUA
|||| GAA
AAAGUGGCACCGA
•|||||||G
3-UUUCGUGGCU
A
A
..-3
..-5
5-..
3-..
Human SERPINB5 locus
35
sgRNA 1
sgRNA 2
sgRNA 3
PAM
Me
+9 +17+12 +22 +27 +32
Me Me Me Me
TSS
+1
Figure 1 Optimization of guide RNA architecture for SpCas9-mediated mammalian genome
editing. (a) Schematic of bicistronic expression vector (PX330) for U6 promoter-driven sgRNA
and CBh promoter-driven human codon-optimized S. pyogenes Cas9 (hSpCas9) used for all
subsequent experiments. The sgRNA consists of a 20-nt guide sequence (blue) and scaffold
(red), truncated at various positions as indicated. (b) SURVEYOR assay for SpCas9-mediated
indels at the human EMX1 and PVALB loci. Arrowheads indicate the expected SURVEYOR
fragments (n = 3). (c) Northern blot analysis for the four sgRNA truncation architectures, with
U1 as loading control. (d) Both wild-type (WT) or nickase mutant (D10A) of SpCas9 promoted
insertion of a HindIII site into the human EMX1 gene. Single-stranded oligonucleotides,
oriented in either the sense or antisense direction relative to genome sequence, were used
as homologous recombination templates (Supplementary Fig. 3). (e) Schematic of the human
SERPINB5 locus. sgRNAs and PAMs are indicated by colored bars above sequence; methylcytosine (Me) are highlighted (pink) and numbered relative
to the transcriptional start site (TSS, +1). (f) Methylation status of SERPINB5 assayed by bisulfite sequencing of 16 clones. Filled circles, methylated
CpG; open circles, unmethylated CpG. (g) Modification efficiency by three sgRNAs targeting the methylated region of SERPINB5, assayed by deep
sequencing (n = 2). Error bars indicate Wilson intervals (Online Methods).
npg
© 2013 Nature America, Inc. All rights reserved.
nature biotechnology VOLUME 31 NUMBER 9 SEPTEMBER 2013 8 2 9
L E T T E R S
cassettes encoding U6-driven sgRNAs using PCR and transfecting the
resulting amplicons (Supplementary Fig. 5). We then performed deep
sequencing of the region flanking each target site (Supplementary
Fig. 6) for two independent biological replicates. From these data, we
applied a binomial model to detect true indel events resulting from
SpCas9 cleavage and NHEJ misrepair and calculated 95% confidence
rG:dT
rA:dC
rU:dT
rG:dG
rC:dT
rU:dG
rC:dA
rG:dA
rA:dA
rA:dG
rU:dC
rC:dC
0
1
GUCACCUCCAAUGACUAGGN
GUCACCUCCAAUGACUAGNG
GUCACCUCCAAUGACUANGG
GUCACCUCCAAUGACUNGGG
GUCACCUCCAAUGACNAGGG
GUCACCUCCAAUGANUAGGG
GUCACCUCCAAUGNCUAGGG
GUCACCUCCAAUNACUAGGG
GUCACCUCCAANGACUAGGG
GUCACCUCCANUGACUAGGG
GUCACCUCCNAUGACUAGGG
GUCACCUCNAAUGACUAGGG
GUCACCUNCAAUGACUAGGG
GUCACCNCCAAUGACUAGGG
GUCACNUCCAAUGACUAGGG
GUCANCUCCAAUGACUAGGG
GUCNCCUCCAAUGACUAGGG
GUNACCUCCAAUGACUAGGG
GNCACCUCCAAUGACUAGGG
a
c
b
2 kb
35
GGACATCGATGTCACCTCCAATGACTAGGGTGGGCAACCA..-
3
|||||||||| ||||||||||
CCTGTAGCTACAGTGGAGGTTACTGATCCCACCCGTTGGT..-
5
||||||||||||||||||||
5- GUCACCUCCAAUGACUAGGG..-3
Guide
sequence
Target site 1
Human EMX1 locus
EMX1 target 1 guide sequence
EMX1 target 2 guide sequence
EMX1 target 6 guide sequence
Mismatching guide RNA bases
Base mismatch identity vs. modication efficiency
Mismatched RNA:DNA pair
PAM
0
1
U
G
C
A
19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
0%
20%
d
e
A
NA
NC
NG
NT
C
3rd PAM base
First 2 PAM bases
G T
0
0.5
1.0
1.5
2.0
2.5
0 15105 20
Distance (bp)
NRG PAM occurrence in
human genome
(counts) × 10
8
19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
0
1
U
G
C
A
0
1
EMX1 target 3 guide sequence
U
G
C
A
U
G
C
A
G U C A C C U C C A A U G A C U A G G G
G C G C C A C C G G U U G A U G U G A U
G A C A U C G A U G U C C U C C C C A U
G A G U C C G A G C A G A A G A A G A A
0
1
5
-..
3-..
Figure 2 Single-nucleotide specificity of SpCas9. (a) Schematic of the experimental design. sgRNAs carrying all possible single base-pair mismatches
(blue Ns) throughout the guide sequence were tested for each EMX1 target site (target site 1 shown as example). (b) Heatmap representation of relative
SpCas9 cleavage efficiency by 57 single-mutated and 1 nonmutated sgRNA each for four EMX1 target sites (aggregated from Supplementary Table 5).
For each EMX1 target, the identities of single base-pair substitutions are indicated on the left; original guide sequence is shown above and highlighted
in the heatmap (gray squares). Modification efficiencies (increasing from white to dark blue) are normalized to the original guide sequence. Sequence
logo representation of the same data can be found in Supplementary Figure 7. (c) Heatmap for relative SpCas9 cleavage efficiency for each possible
RNA:DNA base pair, compiled from aggregate data from single-mismatch guide RNAs for 15 EMX1 targets (Supplementary Fig. 8). Mean cleavage
levels were calculated for the 10 PAM-proximal bases (right bar) and across all substitutions at each position (bottom bar); positions in gray were not
covered by the 469 single-mutated and 15 unmutated sgRNAs tested (Supplementary Table 5). (d) SpCas9-mediated indel frequencies at targets
with all possible PAM sequences, determined using the SURVEYOR nuclease assay. Two target sites from the EMX1 locus were tested for each PAM
(Supplementary Table 4). (e) Histogram of distances between 5-NRG PAM occurrences within the human genome. Putative targets were identified
using both strands of human chromosomal sequences (GRCh37/hg19).
npg
© 2013 Nature America, Inc. All rights reserved.
8 3 0 VOLUME 31 NUMBER 9 SEPTEMBER 2013 nature biotechnology
L E T T E R S
intervals for all reported NHEJ frequencies (Online Methods and
Supplementary Tables 58).
We systematically investigated the effect of base-pairing mis-
matches between guide RNA sequences and target DNA on target
modification efficiency. We chose four target sites within the human
EMX1 gene (1, 2, 3 and 6) and, for each, generated a set of 57 different
guide RNAs containing all possible single-nucleotide substitutions in
positions 1–19 directly 5 of the requisite NGG PAM (Fig. 2a). The
5 guanine at position 20 is preserved, given that the U6 promoter
requires guanine as the first base of its transcript. These off-target’
guide RNAs were then assessed for cleavage activity at the on-target
genomic locus.
Consistent with previous findings
1,2,12
, SpCas9 tolerates single-
base mismatches in the PAM-distal region to a greater extent than
in the PAM-proximal region. In contrast to a model that implies that
a prototypical 10–12 bp PAM-proximal seed sequence largely deter-
mines target specificity
1,2,12
, we found that most bases within the
20-bp target site provide varying degrees of specificity. Single-base
specificity generally ranges from 8 to 14 bp immediately upstream
of the PAM, indicating a sequence-dependent, mismatch-sensitive
boundary that varies in length (Fig. 2b, Supplementary Fig. 7 and
Supplementary Table 5).
To further investigate the contributions of base identity and
position within the guide RNA to SpCas9 specificity, we gener-
ated additional sets of mismatched guide RNAs for 11 more target
sites within the EMX1 locus (Supplementary Fig. 8), totaling over
400 sgRNAs. These guide RNAs were designed to cover all 12 possible
RNA:DNA mismatches for each position in the guide sequence with
at least coverage for positions 1–10. Our aggregate single-
mismatch data reveal multiple exceptions to the seed sequence
model of SpCas9 specificity
1,2,6
(Fig. 2c and Supplementary Table 5).
Within the PAM-proximal region, the degree of tolerance varied
with the identity of a particular mismatch, with rC:dC base-pairing
exhibiting the highest level of disruption to SpCas9 cleavage
activity (Fig. 2c).
In addition to the target specificity, we also investigated the NGG
PAM requirement of SpCas9. To vary the second and third positions
of PAM, we selected 32 target sites within the EMX1 locus encompass-
ing all 16 possible alternate PAMs with 2× coverage (Supplementary
Table 4). Using the SURVEYOR assay, we showed that SpCas9 also
cleaves targets with NAG PAMs, albeit with one-fifth of the efficiency
for target sites with 5-NGG PAMs (Fig. 2d). The tolerance for an NAG
PAM is in agreement with previous bacterial studies
2
and expands the
S. pyogenes Cas9 target space to every 4 bp on average within the
human genome, not accounting for constraining factors such as guide
RNA secondary structure or certain epigenetic modifications (Fig. 2e).
Although we have shown here that methylated DNA sequences can
be cleaved, by SpCas9 further characterization of the implications of
epigenetic factors on CRISPR editing efficiency are needed.
We next explored the effect of multiple base mismatches on SpCas9
target activity. For four targets within the EMX1 gene, we designed sets
of guide RNAs that contained varying combinations of mismatches
to investigate the effect of mismatch number, position and spacing
on SpCas9 target cleavage activity (Fig. 3a
,b, and Supplementary
Table 6). In general, we observed that the total number of mismatched
base-pairs is a key determinant for SpCas9 cleavage efficiency. Two
mismatches, particularly those occurring in a PAM-proximal region,
considerably reduced SpCas9 activity whether these mismatches are
concatenated or interspaced (Fig. 3a,b); this effect is further magni-
fied for three concatenated mismatches (Fig. 3a). Furthermore, three
or more interspaced (Fig. 3c) and five concatenated (Fig. 3a) mis-
matches eliminated detectable SpCas9 cleavage in the vast majority
of loci.
The position of mismatches within the guide sequence also
affected the activity of SpCas9. PAM-proximal mismatches are less
tolerated than PAM-distal counterparts (Fig. 3a
), recapitulating our
2 mismatches Concatenated mismatches
3 mismatches4 mismatches
2 mismatches Concatenated mismatches3 mismatches4 mismatches
2 mismatches Concatenated mismatches3 mismatches4 mismatches
2 mismatches Concatenated mismatches3 mismatches4 mismatches
U
G
U
A
A
A
A
A
A
A
GG
C C
G U
A
AC A
A
A
UC
G G
A
ACG
AAU
A
G
C
U
A
G
G
A AG C
G
A
U
G
A
C
U
A
U
C
AA
AG
CU
GA
CG
GU
UC
UU
GU
AAG
CUG
ACG
GCU
CUU
GUC
AAAGC
GCUGA
GACGG
GGUUC
UCUUG
G
G G G G GU U U UC A A A AAC C C C C
G G G G GU U U UC A A A AAC C C C C
U C A C C U C C A A U G A C U A G G
G
EMX1 target 1 guide sequence
EMX1 target 2 guide sequence EMX1 target 3 guide sequence EMX1 target 6 guide sequence
G
C
C
C
G C
C
C
AC
AA
AU
AG
U
C
U
C
U
C
A
C
C
C
UC
U U
A U
C
UU C
C
C
UU
A C
U
UUC
UAG
U
U
U
U
C
C
C
U CC U
C
C
C
A
C
G
G
C
G
U
CG
UU
UU
CU
UC
AC
GA
UC
GU
GUU
UUC
UUC
ACG
AUC
GUG
CGUUU
UUUCU
CUUCA
CACGA
GAUCG
G A C A U C G A U G U C C U C C C C A U
G A C A U C G A U G U C C U C C C C A U
G A C A U C G A U G U C C U C C C C A U
C
A
G
U
C U
U
U
UU
UU
UC
UG
U
C
G
C
A
C
A
C
C
C
CG
C A
A C
C
AA C
C
C
CC
A G
A
AAG
AAU
A
C
A
C
C
C
G
A CG A
G
C
U
A
C
U
U
C
U
C
CG
AC
AC
GA
CC
AA
UU
GU
UA
GAC
ACG
ACC
AAU
UGU
UAU
CGACA
CACGA
GACCA
CAAUU
UUGUU
G C G C C A C C G G U U G A U G U G A U
G C G C C A C C G G U U G A U G U G A U
G C G C C A C C G G U U G A U G U G A U
C
U
U
A
G A
A
A
AA
AA
AA
AU
A
G
U
G
A
G
A
G
G
G
GG
A G
U A
G
AG G
G
G
AA
U G
A
AGG
AGG
A
G
G
A
G
G
G
A GG G
G
G
U
U
G
G
G
G
G
A
GG
AG
GA
GG
AG
UA
GA
UC
CA
GAG
GAG
GAG
UAG
AUC
CAG
GGAGG
GGAGG
GGAGU
GUAGA
GAUCC
G A G U C C G A G C A G A A G A A G A A
G A G U C C G A G C A G A A G A A G A A
G A G U C C G A G C A G A A G A A G A A
G
A
G
G
G G
G
G
AC
AG
AA
AC
0 2 4 6 8 10
0 2 4 6 8 10
0 2 4 6 8 10
0 10 20 30 40
0 10 20 30 40
0 10 20 30 40
0 10 20 30 40
0 10 20 30 40
0 10 20 30 40
0 15 205 10
0 15 205 10
0 15 205 10
Indel (%) Indel (%) Indel (%) Indel (%)
a
b
c
Figure 3 Multiple mismatch specificity of SpCas9. (ac) SpCas9 cleavage efficiency with guide RNAs containing consecutive mismatches of 2, 3 or
5 bases (a), or multiple mismatches separated by different numbers of unmutated bases for EMX1 targets 1, 2, 3 and 6 (b,c). Rows represent each
mutated guide RNA; nucleotide substitutions are shown in white cells; gray cells denote unmutated bases. All indel frequencies are absolute and
analyzed by deep sequencing from two biological replicas. Error bars indicate Wilson intervals (Online Methods).
npg
© 2013 Nature America, Inc. All rights reserved.
nature biotechnology VOLUME 31 NUMBER 9 SEPTEMBER 2013 8 3 1
L E T T E R S
observations from the single base-pair mismatch data (Fig. 2c). This
effect is particularly salient in guide sequences bearing a small number
of total mismatches, whether those are consecutive (
Fig. 3a) or inter-
spaced (Fig. 3b). Additionally, guide sequences with mismatches
spaced four or more bases apart also mediated SpCas9 cleavage in
some cases (Fig. 3c). Thus, together with the identity of mismatched
base-pairing, we observed that many off-target cleavage effects can be
explained by a combination of mismatch number and position.
Given these mismatched guide RNA results, we expected that for any
particular sgRNA, SpCas9 may cleave genomic loci that contain small
numbers of mismatched bases. For the four EMX1 targets described
above, we computationally selected 117 candidate off-target sites in the
human genome that are followed by a 5-NRG PAM and meet any of
the following additional criteria: (i) up to five mismatches, (ii) short
insertions or deletions or (iii) mismatches only in the PAM-distal region.
Additionally, we assessed off-target loci of high sequence similarity with-
out the PAM requirement. The majority of off-target sites tested for each
sgRNA (30/31, 23/23, 48/51 and 12/12 sites for EMX1 targets 1, 2, 3 and 6,
respectively) exhibited modification efficiencies at least 2 magnitudes
lower than that of corresponding on-targets (Fig. 4a,b, Supplementary
Fig. 9 and Supplementary Tables 7 and 8). Of the four off-target sites
that exhibit substantial modification efficiencies, three contained only
mismatches in the PAM-distal region, consistent with our multiple
mismatch sgRNA observations (Fig. 3). Notably, these three loci were
followed by 5-NAG PAMs, demonstrating that off-target analyses of
SpCas9 must include 5-NAG as well as 5-NGG candidate loci.
Enzymatic specificity and activity strength are often highly depend-
ent on reaction conditions, which at high enzyme concentration might
amplify off-target activity
28,29
. One potential strategy for minimizing
nonspecific cleavage is to limit the enzyme concentration, namely
the level of SpCas9-sgRNA complex. Cleavage specificity, measured
as the ratio of on- to off-target cleavage, increased dramatically as we
decreased the equimolar amounts of SpCas9 and sgRNA transfected
into 293FT cells (Fig. 4c,d) from 7.1 × 10
−10
to 1.8 × 10
−11
nmol/cell
(400 ng to 10 ng of Cas9-sgRNA plasmid). qRT-PCR assay confirmed
that the level of hSpCas9 mRNA and sgRNA decreased proportionally
to the amount of transfected DNA (Supplementary Fig. 10). Whereas
specificity increased gradually by nearly fourfold as we decreased the
transfected DNA amount from 7.1 × 10
−10
to 9.0 × 10
−11
nmol/cell
(400 ng to 50 ng plasmid), we observed a notable additional seven-
fold increase in specificity upon further decreasing transfected DNA
from 9.0 × 10
−11
to 1.8 × 10
−11
nmol/cell (50 ng to 10 ng plasmid;
Fig. 4c). These findings suggest that we can minimize the level of
off-target activity by titrating the amount of SpCas9 and sgRNA DNA
delivered. However, increasing specificity by reducing the amount
of transfected DNA also leads to a reduction in on-target cleavage.
These measurements enable quantitative integration of specificity and
efficiency criteria into dosage choice to optimize SpCas9 activity for
different applications. Additional work to explore modifications in
SpCas9 and sgRNA design may improve SpCas9-intrinsic specificity
without sacrificing cleavage efficiency.
The ability to program SpCas9 to target specific sites in the genome
by simply designing a short guide RNA complementary to the desired
target site holds enormous potential for applications throughout biol-
ogy and medicine. Our results demonstrate that the specificity of
SpCas9-mediated DNA cleavage is sequence- and locus-dependent and
EMX1
target 1
EMX1 target 1
OT#8
9
6
3
0
c
Indel (%)
0 400300200100
Cas9 + sgRNA plasmid dosage (ng)
d
OT#8
Specicity
0 400300200100
0
100
200
300
400
Cas9 + sgRNA plasmid dosage (ng)
Specicity
0 400300200100
OT#1
OT#26
OT#31
0
10
20
30
40
Cas9 + sgRNA plasmid dosage (ng)
0 400300200100
EMX1 target 3
Indel (%)
18
12
6
0
EMX1
target 3
OT#1
OT#26
OT#31
Cas9 + sgRNA plasmid dosage (ng)
A
PAM
Target
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
EMX1 target 3 guide sequence
100 20 30 40 50
G G G
C
G
T
G
A
G
G
A
C
C
T
A
G
A
G
A
C
A
G
G
A
T
G
G
A
G
T
A
G
A
G
T
G
A
A
G
G
G
A
A
A
A
A
A
A
A
A
A
A
G
A
A
A
G
A
G
G
A
G
A
G
A
A
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
G
b
T
A
A
T
T
T T T
T
T
A
A
A
A
A
A
A
A
A
A
A
A
C
C
C
CG
GG
G C
C G
C G A
G
C
C
T
T
A
A
A
A A
A
A
T G
G A
G G
G
G
A
A A G
A T
T
T
T
T
T G
A C
A A
A
A
G
C
C
C
C
G G
G
GG
A
A
A C
G G
T C
Genomic off-target (OT) loci
a
A
C T
CT
A
G
T
T
T
G
G
A T A
G
T
A
A T
AT
C
G
Target
1
2
3
4
5
6
7
8
PAM
EMX1 target 1 guide sequence
Indel (%)
Indel (%)
50 10 15 20 25
T G G
C
C
A
C
A
A
A
A
A
A
A
A
G
A
A
A
G
G
G
G
G
G
G
G
Genomic off-target
(OT) loci
G G G G GU U U UC C C C C CA A A A A
G A G A AA G G AG C C A G GU C A A A
GAAGAAGAAGGG
GAAGAAGGAAGGG
GAAGAAAGAAGGG
GAAGAATTAGAAGGG
GAAGAACAGAAGGG
GAAGA-GAAGGG
GAAGAA-AAGGG
GAAGAAGAAGAG
GAAGAAAGAAGAG
GAAGA-GAAGAG
GAAGAA-AAGAG
GGAGAAGAAGAG
GGAGAAAGAAGAG
GGAGA-GAAGAG
GGAGAA-AAGAG
GGAGAATAGAAGAG
AT
GGAGAA
GAAGAAAGACGG
GAAGAA-GACGG
GAAGAAAAGACGG
GAAGAATAGACGG
GAAGAACAGACGG
GAAGAATAAGACGG
WT
WT
WT
WT
+1
+2
+4
+3
1
+2
+2
+2
+3
+1
+2
Locus: target
Locus: 1
Locus: 26
Locus: 31
1
1
1
1
+1
1
1
5
-
5
-
5
-
5
- -3
-3
-3
-3
WT
1
+2
WT
1
+2
+2
+1
Locus: target
Locus: 8
5
-
5
-
-3
-3
TGACTAGGGAAG
TGACT-GGGAAG
TGACTATAGGGAAG
TGACTGGAGGGAAG
TGACTACGGGAAG
TGACTAGGGTGG
TGACT-GGGTGG
TGACTATAGGGTGG
Figure 4 SpCas9-mediated indel frequencies at predicted genomic
off-target loci. (a,b) Cleavage levels at putative genomic off-target loci
containing two or three individual mismatches (white cells) for EMX1
target 1 and target 3 are analyzed by deep sequencing. List of off-target
sites are ordered by median position of mutations. Putative off-target sites
with additional mutations did not have detectable indels (Supplementary
Table 8). The Cas9 dosage was 3 × 10
−10
nmol/cell, with equimolar
sgRNA delivery. Error bars indicate Wilson intervals (Online Methods).
(c,d) Indel frequencies for EMX1 targets 1 and 3 and selected off-target
loci (OT) as a function of SpCas9 and sgRNA dosage, (n = 2, Wilson
intervals). 400 ng to 10 ng of Cas9-sgRNA plasmid corresponds to
7.1 × 10
−10
to 1.8 × 10
−11
nmol/cell. Cleavage specificity is measured
as a ratio of on- to off-target cleavage.
npg
© 2013 Nature America, Inc. All rights reserved.
8 3 2 VOLUME 31 NUMBER 9 SEPTEMBER 2013 nature biotechnology
L E T T E R S
governed by the quantity, position and identity of mismatching bases.
Whereas the PAM-proximal 8–12 bp of the guide sequence generally
defines specificity, the PAM-distal sequences also contribute to the
overall specificity of SpCas9-mediated DNA cleavage. Although there
may be off-target cleavage for a given guide sequence, they can be pre-
dicted and likely minimized by following general design guidelines.
To maximize SpCas9 specificity for editing a particular gene, one
should identify potential off-target’ genomic sequences by consid-
ering the following four constraints. First and foremost, they should
not be followed by a PAM with either 5-NGG or 5-NAG sequences.
Second, their global sequence similarity to the target sequence should
be minimized, and guide sequences with genomic off-target loci that
have fewer than three mismatches should be avoided. Third, at least two
mismatches should lie within the PAM-proximal region of the off-target
site. Fourth, a maximal number of mismatches should be consecutive
or spaced less than four bases apart. Finally, the amount of SpCas9 and
sgRNA can be titrated to optimize on- to off-target cleavage ratio.
Using these criteria, we formulated a scoring algorithm to inte-
grate and quantify the contributions of mismatch location, density
and identity on SpCas9 on-target and off-target cleavage. We applied
the aggregate cleavage efficiencies of single-mismatch guide RNAs
to test this scoring scheme separately on genome-wide targets and
found that these factors, taken together, accounted for >50% of the
variance in cutting-frequency rank among the genome-wide targets
studied (Supplementary Fig. 11).
Implementing the guidelines delineated above, we designed a com-
putational tool to facilitate the selection and validation of sgRNAs as
well as to predict off-target loci for specificity analyses; this tool can
be accessed at http://www.genome-engineering.org/. These results
and tools further extend the SpCas9 system as a versatile alternative
to ZFNs and TALENs for genome editing applications. Further work
examining the thermodynamics and in vivo stability of sgRNA-DNA
duplexes will likely yield additional predictive power for off-target
activity, whereas exploration of SpCas9 mutants and orthologs may
yield novel variants with improved specificity.
METHODS
Methods and any associated references are available in the online
version of the paper.
Accession codes. All raw reads can be accessed at NCBI BioProject,
accession number SRP023129. Indices are described in Supplementary
Tables 58.
Note: Any Supplementary Information and Source Data files are available in the
online version of the paper.
ACKNOWLEDGMENTS
We thank A. Shalek, E. Stamenova and D. Gray for expert help with DNA
sequencing, R. Barretto for genome-wide PAM analysis, as well as D. Altshuler,
P.A. Sharp, and the entire Zhang Lab for their support and advice. P.D.H. is a James
Mills Pierce Fellow. D.A.S. is an National Science Foundation pre-doctoral fellow
and J.A.W. is supported by a Life Science Research Foundation Fellowship. X.W. is
a Howard Hughes Medical Institute International Student Research Fellow and is
supported by National Institutes of Health (NIH) grants R01-GM34277 and
R01-CA133404 to P.A. Sharp, X.W.s thesis advisor. G.B. is supported by an
NIH Nanomedicine Development Center Award (PN2EY018244).This work
is supported by an NIH Director’s Pioneer Award (DP1-MH100706), an NIH
Transformative R01 grant (R01-DK097768) to D. Altshuler, the Keck, McKnight,
Damon Runyon, Searle Scholars, Klingenstein and Simons Foundations, and Bob
Metcalfe and Jane Pauley. The authors wish to dedicate this paper to the memory
of Officer Sean Collier, for his caring service to the MIT community and for his
sacrifice. Reagents are available to the academic community through Addgene,
and associated protocols, support forums and computational tools are available
through the Zhang lab website (http://www.genome-engineering.org/).
AUTHOR CONTRIBUTIONS
J.A.W. and F.A.R. contributed equally to this work. P.D.H., D.A.S., F.A.R., S.K.
and F.Z. designed and performed the experiments. P.D.H., D.A.S., J.A.W., Y.L.,
S.K., F.A.R. and F.Z. analyzed the data. V.A. and O.S. contributed computational
prediction of CRISPR off-target sites and X.W. performed the northern blot
analysis. P.D.H., F.A.R., D.A.S. and F.Z. wrote the manuscript with help from
all authors.
COMPETING FINANCIAL INTERESTS
The authors declare competing financial interests: details are available in the online
version of the paper.
Reprints and permissions information is available online at http://www.nature.com/
reprints/index.html.
1. Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science
339, 819–823 (2013).
2. Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini, L.A. RNA-guided editing of bacterial
genomes using CRISPR-Cas systems. Nat. Biotechnol. 31, 233–239 (2013).
3. Wang, H. et al. One-step generation of mice carrying mutations in multiple genes
by CRISPR/Cas-mediated genome engineering. Cell 153, 910–918 (2013).
4. Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339,
823–826 (2013).
5. Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471
(2013).
6. Cho, S.W., Kim, S., Kim, J.M. & Kim, J.S. Targeted genome engineering in human cells
with the Cas9 RNA-guided endonuclease. Nat. Biotechnol. 31, 230–232 (2013).
7. Chang, N. et al. Genome editing with RNA-guided Cas9 nuclease in zebrafish
embryos. Cell Res. 23, 465–472 (2013).
8. Hwang, W.Y. et al. Efficient genome editing in zebrafish using a CRISPR-Cas system.
Nat. Biotechnol. 31, 227–229 (2013).
9. Shen, B. et al. Generation of gene-modified mice via Cas9/RNA-mediated gene
targeting. Cell Res. 23, 720–723 (2013).
10. Gratz, S.J. et al. Genome engineering of Drosophila with the CRISPR RNA-guided
Cas9 nuclease. Genetics doi:10.1534/genetics.113.152710 (2 July 2013).
11. Deltcheva, E. et al. CRISPR RNA maturation by trans-encoded small RNA and host
factor RNase III. Nature 471, 602–607 (2011).
12. Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive
bacterial immunity. Science 337, 816–821 (2012).
13. Guschin, D.Y. et al. A rapid and general assay for monitoring endogenous gene
modification. Methods Mol. Biol. 649, 247–256 (2010).
14. Bogenhagen, D.F. & Brown, D.D. Nucleotide sequences in Xenopus 5S DNA required
for transcription termination. Cell 24, 261–270 (1981).
15. Bultmann, S. et al. Targeted transcriptional activation of silent oct4 pluripotency
gene by combining designer TALEs and inhibition of epigenetic modifiers. Nucleic
Acids Res. 40, 5368–5377 (2012).
16. Valton, J. et al. Overcoming transcription activator-like effector (TALE) DNA binding domain
sensitivity to cytosine methylation. J. Biol. Chem. 287, 3842738432 (2012).
17. Christian, M. et al. Targeting DNA double-strand breaks with TAL effector nucleases.
Genetics 186, 757–761 (2010).
18. Miller, J.C. et al. A TALE nuclease architecture for efficient genome editing.
Nat. Biotechnol. 29, 143–148 (2011).
19. Mussolino, C. et al. A novel TALE nuclease scaffold enables high genome editing activity
in combination with low toxicity. Nucleic Acids Res. 39, 9283–9293 (2011).
20. Hsu, P.D. & Zhang, F. Dissecting neural function using targeted genome engineering
technologies. ACS Chem. Neurosci. 3, 603–610 (2012).
21. Sanjana, N.E. et al. A transcription activator-like effector toolbox for genome
engineering. Nat. Protoc. 7, 171–192 (2012).
22. Porteus, M.H. & Baltimore, D. Chimeric nucleases stimulate gene targeting in
human cells. Science 300, 763 (2003).
23. Miller, J.C. et al. An improved zinc-finger nuclease architecture for highly specific
genome editing. Nat. Biotechnol. 25, 778–785 (2007).
24. Sander, J.D. et al. Selection-free zinc-finger-nuclease engineering by context-
dependent assembly (CoDA). Nat. Methods 8, 67–69 (2011).
25. Wood, A.J. et al. Targeted genome editing across species using ZFNs and TALENs.
Science 333, 307 (2011).
26. Bobis-Wozowicz, S., Osiak, A., Rahman, S.H. & Cathomen, T. Targeted genome
editing in pluripotent stem cells using zinc-finger nucleases. Methods 53, 339–346
(2011).
27. Qi, L.S. et al. Repurposing CRISPR as an RNA-guided platform for sequence-specific
control of gene expression. Cell 152, 1173–1183 (2013).
28. Michaelis, L.M. Maud “Die kinetik der invertinwirkung.”. Biochemistry Zeitung 49,
333–369 (1913).
29. Mahfouz, M.M. et al. De novo-engineered transcription activator-like effector (TALE)
hybrid nuclease with novel DNA binding specificity creates double-strand breaks.
Proc. Natl. Acad. Sci. USA 108, 2623–2628 (2011).
npg
© 2013 Nature America, Inc. All rights reserved.
nature biotechnology
doi:10.1038/nbt.2647
ONLINE METHODS
Cell culture and transfection. Human embryonic kidney (HEK) cell line
293FT (Life Technologies) was maintained in Dulbeccos modified Eagle’s
Medium (DMEM) supplemented with 10% FBS (HyClone), 2 mM GlutaMAX
(Life Technologies), 100 U/ml penicillin, and 100 µg/ml streptomycin at
37 °C with 5% CO
2
incubation.
293FT cells were seeded onto 6-well plates, 24-well plates or 96-well plates
(Corning) 24 h before transfection. Cells were transfected using Lipofectamine
2000 (Life Technologies) at 80–90% confluency following the manufacturer’s
recommended protocol. For each well of a 6-well plate, a total of 1 µg of
Cas9+sgRNA plasmid was used. For each well of a 24-well plate, a total of
500 ng Cas9+sgRNA plasmid was used unless otherwise indicated. For each
well of a 96-well plate, 65 ng of Cas9 plasmid was used at a 1:1 molar ratio to
the U6-sgRNA PCR product.
Human embryonic stem cell line HUES9 (Harvard Stem Cell Institute core)
was maintained in feeder-free conditions on GelTrex (Life Technologies)
in mTesR medium (Stemcell Technologies) supplemented with 100 µg/ml
Normocin (InvivoGen). HUES9 cells were transfected with Amaxa P3 Primary
Cell 4-D Nucleofector Kit (Lonza) following the manufacturer’s protocol.
SURVEYOR nuclease assay for genome modification. 293FT and HUES9
cells were transfected with DNA as described above. Cells were incubated at
37 °C for 72 h post-transfection before genomic DNA extraction. Genomic
DNA was extracted using the QuickExtract DNA Extraction Solution
(Epicentre) following the manufacturer’s protocol. Briefly, pelleted cells were
resuspended in QuickExtract solution and incubated at 65 °C for 15 min,
68 °C for 15 min, and 98 °C for 10 min.
The genomic region flanking the CRISPR target site for each gene was PCR
amplified (target sites and primers listed in Supplementary Tables 1 and 2),
and products were purified using QiaQuick Spin Column (Qiagen) follow-
ing the manufacturer’s protocol. 400 ng total of the purified PCR products
were mixed with 2 µl 1 Taq DNA Polymerase PCR buffer (Enzymatics)
and ultrapure water to a final volume of 20 µl, and subjected to a re-anneal-
ing process to enable heteroduplex formation: 95 °C for 10 min, 95 °C to
85 °C ramping at −2 °C/s, 85 °C to 25 °C at −0.25 °C/s, and 25 °C hold for
1 min. After re-annealing, products were treated with SURVEYOR nuclease
and SURVEYOR enhancer S (Transgenomics) following the manufacturer’s
recommended protocol, and analyzed on 4–20% Novex TBE polyacrylamide
gels (Life Technologies). Gels were stained with SYBR Gold DNA stain (Life
Technologies) for 30 min and imaged with a Gel Doc gel imaging system
(Bio-rad). Quantification was based on relative band intensities. Indel percent-
age was determined by the formula, 100 × (1 (1 (b + c)/(a + b + c))
1/2
),
where a is the integrated intensity of the undigested PCR product, and b and
c are the integrated intensities of each cleavage product.
Northern blot analysis of tracrRNA expression in human cells. Northern
blots were done as previously described
1
. Briefly, RNAs were extracted using
the mirPremier microRNA Isolation Kit (Sigma) and heated to 95 °C for 5 min
before loading on 8% denaturing polyacrylamide gels (SequaGel, National
Diagnostics). Afterwards, RNA was transferred to a Hybond N+ membrane
(GE Healthcare) and crosslinked with Stratagene UV Crosslinker (Stratagene).
Probes were labeled with (gamma-
32
P) ATP (PerkinElmer) with T4 polynucleo-
tide kinase (New England Biolabs). After washing, membrane was exposed to
phosphor screen for 1 h and scanned with phosphorimager (Typhoon).
Bisulfite sequencing to assess DNA methylation status. Genomic DNA from
293FT cells was isolated with the DNeasy Blood & Tissue Kit (Qiagen) and
bisulfite converted with EZ DNA Methylation-Lightning Kit (Zymo Research).
Bisulfite PCR was conducted using KAPA2G Robust HotStart DNA Polymerase
(KAPA Biosystems) with primers designed using the Bisulfite Primer Seeker
(Zymo Research, Supplementary Table 2). Resulting PCR amplicons were gel-
purified, digested with EcoRI and HindIII, and ligated into a pUC19 backbone
before transformation. Individual clones were then Sanger sequenced to assess
DNA methylation status.
In vitro transcription and cleavage assay. Whole cell lysates from 293FT
cells were prepared with lysis buffer (20 mM HEPES, 100 mM KCl, 5 mM
MgCl2, 1 mM DTT, 5% glycerol, 0.1% Triton X-100) supplemented with
Protease Inhibitor Cocktail (Roche). T7-driven sgRNA was transcribed
in vitro using custom oligos (Supplementary Sequences) and HiScribe T7
In vitro Transcription Kit (NEB), following the manufacturer’s recommended
protocol. To prepare methylated target sites, pUC19 plasmid was methylated
by M.SssI and tested by digestion with HpaII. Unmethylated and successfully
methylated pUC19 plasmids were linearized by NheI. The in vitro cleavage
assay was carried out as follows: for a 20 µl cleavage reaction, 10 µl of cell
lysate was incubated with 2 µl cleavage buffer (100 mM HEPES, 500 mM KCl,
25 mM MgCl
2
, 5 mM DTT, 25% glycerol), 1 µg in vitro transcribed RNA and
300 ng pUC19 plasmid DNA.
Deep sequencing to assess targeting specificity. HEK 293FT cells plated
in 96-well plates were transfected with Cas9 plasmid DNA and sgRNA PCR
cassette 72 h before genomic DNA extraction (Supplementary Fig. 4). The
genomic region flanking the CRISPR target site for each gene was ampli-
fied (Supplementary Fig. 6, Supplementary Table 5 and Supplementary
Sequences) by a fusion PCR method to attach the Illumina P5 adapters as
well as unique sample-specific barcodes to the target amplicons (schematic
described in Supplementary Fig. 5). PCR products were purified using
EconoSpin 96-well Filter Plates (Epoch Life Sciences) following the manu-
facturer’s recommended protocol.
Barcoded and purified DNA samples were quantified by Quant-iT
PicoGreen dsDNA Assay Kit or Qubit 2.0 Fluorometer (Life Technologies)
and pooled in an equimolar ratio. Sequencing libraries were then sequenced
with the Illumina MiSeq Personal Sequencer (Life Technologies).
Sequencing data analysis and indel detection. MiSeq reads were filtered by
requiring an average Phred quality (Q score) of at least 23, as well as perfect
sequence matches to barcodes and amplicon forward primers. Reads from
on- and off-target loci were analyzed by first performing Smith-Waterman
alignments against amplicon sequences that included 50 nucleotides upstream
and downstream of the target site (a total of 120 bp). Alignments, meanwhile,
were analyzed for indels from 5 nucleotides upstream to 5 nucleotides down-
stream of the target site (a total of 30 bp). Analyzed target regions were dis-
carded if part of their alignment fell outside the MiSeq read itself, or if matched
base-pairs comprised less than 85% of their total length.
Negative controls for each sample provided a gauge for the inclusion or
exclusion of indels as putative cutting events. For each sample, an indel was
counted only if its quality score exceeded µ σ, where
µ
was the mean quality-
score of the negative control corresponding to that sample and
σ
was the s.d.
of the same. This yielded whole target-region indel rates for both negative
controls and their corresponding samples. Using the negative control’s per-
target-region-per-read error rate, q, the samples observed indel count n, and
its read-count R, a maximum-likelihood estimate for the fraction of reads
having target-regions with true-indels, p, was derived by applying a binomial
error model, as follows.
Letting the (unknown) number of reads in a sample having target regions
incorrectly counted as having at least 1 indel be E, we can write (without
making any assumptions about the number of true indels)
Prob |( )
( )
( )
( )
E p
R p
E
q q
E
R p E
=
1
1
1
as R(1p) is the number of reads having target-regions with no true indels.
Meanwhile, because the number of reads observed to have indels is n, n = E +
Rp, that is, the number of reads having target-regions with errors but no true
indels plus the number of reads whose target-regions correctly have indels.
We can then rewrite the above
Prob | Prob |( ) ( )
( )
( )E p n E Rp p
R p
n Rp
q q
n Rp R n
= = + =
1
1
Taking all values of the frequency of target-regions with true-indels p
to be equally probable a priori, Prob(n|p) Prob(p|n). The maximum-
likelihood estimate (MLE) for the frequency of target regions with true
npg
© 2013 Nature America, Inc. All rights reserved.
nature biotechnology
doi:10.1038/nbt.2647
indels was therefore set as the value of p that maximized Prob(n|p). This was
evaluated numerically.
In order to place error bounds on the true-indel read frequencies in the
sequencing libraries themselves, Wilson score intervals
2
were calculated for
each sample, given the MLE-estimate for true-indel target-regions, Rp, and
the number of reads R. Explicitly, the lower bound l and upper bound u were
calculated as
l Rp
z
z Rp p z R z
u Rp
z
z Rp p z
= + +
+
= + + +
2
2 2
2
2
2
1 4
2
1 4
( ) / ( )
( ) /
/
+
/( )R z
2
where z, the standard score for the confidence required in normal distribution
of variance 1, was set to 1.96, meaning a confidence of 95%. The maximum
upper bounds and minimum lower bounds for each biological replicate are
listed in Supplementary Tables 58.
qRT-PCR analysis of relative Cas9 and sgRNA expression. 72 h post-
transfection, total RNA from 293FT cells was harvested with miRNeasy
Micro Kit (Qiagen). Reverse-strand synthesis for sgRNAs was performed
with qScript Flex cDNA kit (VWR) and custom first-strand synthesis
primers (Supplementary Table 2). qPCR analysis was done with Fast SYBR
Green Master Mix (Life Technologies) and custom primers (Supplementary
Table 2), using GAPDH as an endogenous control. Relative quantification was
calculated by the ∆∆CT method.
npg
© 2013 Nature America, Inc. All rights reserved.
  • Article
    Full-text available
    Mosquito-borne infectious diseases such as malaria, dengue fever, chikungunya, and zika are responsible for hundreds of thousands of deaths across the globe each year. Due to the nature of the viruses and parasites that cause these diseases, targeting their mosquito vectors has become the most effective method for reducing transmission. However, an ecologically and evolutionarily sustainable method for mosquito population reduction has yet to be developed. This may be about to change due to the proven effectiveness of genome editing in mosquitoes using the CRISPR/Cas9 system. CRISPR/Cas9, found in nature as the adaptive immune system of bacteria and now adapted as a biological tool for use in any organism, is revolutionizing the study of genetics and now may also be the key to ending a global public health crisis. Current population control methods utilizing transgenic mosquitoes have ultimately failed at the hands of evolution due to unintended fitness costs. Here, we propose a novel control method for the invasive vector mosquito species, Aedes albopictus, using CRISPR/Cas9 to target genes that are essential for a period of developmental arrest known as photoperiodic diapause. Since diapause allows the species to invade temperate regions, genetic disruption of this pathway should result in the inability of Ae. albopictus to inhabit temperate environments, which in turn should inhibit the spread of mosquito-borne diseases in these regions.
  • Article
    Full-text available
    The clustered regularly interspaced short palin-dromic repeats (CRISPR)/CRISPR-associated (Cas) system discovered using bacteria has been repur-posed for genome editing in human cells. Transient expression of the editor proteins (e.g. Cas9 protein) is desirable to reduce the risk of mutagene-sis from off-target activity. Using the specific interaction between bacteriophage RNA-binding proteins and their RNA aptamers, we developed a system able to package up to 100 copies of Staphylococcus aureus Cas9 (SaCas9) mRNA in each lentivirus-like bionanoparticle (LVLP). The SaCas9 LVLPs mediated transient SaCas9 expression and achieved highly efficient genome editing in the presence of guide RNA. Lower off-target rates occurred in cells transduced with LVLPs containing SaCas9 mRNA, compared with cells transduced with adeno-associated virus or lentivirus expressing SaCas9. Our LVLP system may be useful for efficiently delivering Cas9 mRNA to cell lines and primary cells for in vitro and in vivo gene editing applications.
  • Article
    Fragile X syndrome results from a loss of the RNA-binding protein fragile X mental retardation protein (FMRP). How FMRP regulates neuronal development and function remains unclear. Here we show that FMRP-deficient immature neurons exhibit impaired dendritic maturation, altered expression of mitochondrial genes, fragmented mitochondria, impaired mitochondrial function, and increased oxidative stress. Enhancing mitochondrial fusion partially rescued dendritic abnormalities in FMRP-deficient immature neurons. We show that FMRP deficiency leads to reduced Htt mRNA and protein levels and that HTT mediates FMRP regulation of mitochondrial fusion and dendritic maturation. Mice with hippocampal Htt knockdown and Fmr1-knockout mice showed similar behavioral deficits that could be rescued by treatment with a mitochondrial fusion compound. Our data unveil mitochondrial dysfunction as a contributor to the impaired dendritic maturation of FMRP-deficient neurons and suggest a role for interactions between FMRP and HTT in the pathogenesis of fragile X syndrome. © 2019, The Author(s), under exclusive licence to Springer Nature America, Inc.
  • Article
    The advent of engineered T cells as a form of immunotherapy marks the beginning of a new era in medicine, providing a transformative way to combat complex diseases such as cancer. Following FDA approval of CAR T cells directed against the CD19 protein for the treatment of acute lymphoblastic leukemia and diffuse large B cell lymphoma, CAR T cells are poised to enter mainstream oncology. Despite this success, a number of patients are unable to receive this therapy due to inadequate T cell numbers or rapid disease progression. Furthermore, lack of response to CAR T cell treatment is due in some cases to intrinsic autologous T cell defects and/or the inability of these cells to function optimally in a strongly immunosuppressive tumor microenvironment. We describe recent efforts to overcome these limitations using CRISPR/Cas9 technology, with the goal of enhancing potency and increasing the availability of CAR-based therapies. We further discuss issues related to the efficiency/scalability of CRISPR/Cas9-mediated genome editing in CAR T cells and safety considerations. By combining the tools of synthetic biology such as CARs and CRISPR/Cas9, we have an unprecedented opportunity to optimally program T cells and improve adoptive immunotherapy for most, if not all future patients.
  • Article
    Cas9 nucleases can be programmed with single guide RNAs (sgRNAs) to mediate gene editing. High CRISPR/Cas9-mediated gene knockout efficiencies are essential for genetic screens and critically depend on the properties of the sgRNAs used. The specificity of an sgRNA is defined by its targeting sequence. Here, we discovered that two short sequence motifs at the 3′ end of the targeting sequence are almost exclusively present in inefficient sgRNAs of published sgRNA-activity datasets. By specific knock-in of sgRNA target sequences with or without these motifs and quantitative measurement of knockout efficiency, we show that the presence of these motifs in sgRNAs per se results in a 10-fold reduction of gene knockout frequencies. Mechanistically, the cause of the low efficiency differs between the two motifs. These sequence motifs are relevant for future sgRNA design approaches and studies of Cas9-DNA interactions.
  • Chapter
    The ability to create targeted mutations in specific genes, and therefore a loss-of-function condition, provides essential information about their endogenous functions during development and homeostasis. The discovery that CRISPR-Cas9 can target specific sequences according to base-pair complementarity and readily create knockouts in a desired gene has elevated the implementation of genetic analysis in numerous organisms. As CRISPR-Cas9 has become a powerful tool in a number of species, multiple methods for designing, creating, and screening editing efficiencies have been published, each of which has unique benefits. This chapter presents a cost-efficient, accessible protocol for creating knockout mutants in zebrafish using insertions/deletions (INDELS), from target site selection to mutant propagation, using basic laboratory supplies. The presented approach can be adapted to other systems, including any vertebrate species.
  • Article
    Dilated cardiomyopathy (DCM) is a leading cause of morbidity and mortality worldwide; yet how genetic variation and environmental factors impact DCM heritability remains unclear. Here, we report that compound genetic interactions between DNA sequence variants contribute to the complex heritability of DCM. By using genetic data from a large family with a history of DCM, we discovered that heterozygous sequence variants in the TROPOMYOSIN 1 (TPM1) and VINCULIN (VCL) genes cosegregate in individuals affected by DCM. In vitro studies of patient-derived and isogenic human-pluripotent-stem-cell-derived cardiomyocytes that were genome-edited via CRISPR to create an allelic series of TPM1 and VCL variants revealed that cardiomyocytes with both TPM1 and VCL variants display reduced contractility and sarcomeres that are less organized. Analyses of mice genetically engineered to harbour these human TPM1 and VCL variants show that stress on the heart may also influence the variable penetrance and expressivity of DCM-associated genetic variants in vivo. We conclude that compound genetic variants can interact combinatorially to induce DCM, particularly when influenced by other disease-provoking stressors.
  • Article
    We have adapted a bacterial CRISPR RNA/Cas9 system to precisely engineer the Drosophila genome and report that Cas9-mediated genomic modifications are efficiently transmitted through the germline. This RNA-guided Cas9 system can be rapidly programmed to generate targeted alleles for probing gene function in Drosophila.
  • Article
    Mice carrying mutations in multiple genes are traditionally generated by sequential recombination in embryonic stem cells and/or time-consuming intercrossing of mice with a single mutation. The CRISPR/Cas system has been adapted as an efficient gene-targeting technology with the potential for multiplexed genome editing. We demonstrate that CRISPR/Cas-mediated gene editing allows the simultaneous disruption of five genes (Tet1, 2, 3, Sry, Uty - 8 alleles) in mouse embryonic stem (ES) cells with high efficiency. Coinjection of Cas9 mRNA and single-guide RNAs (sgRNAs) targeting Tet1 and Tet2 into zygotes generated mice with biallelic mutations in both genes with an efficiency of 80%. Finally, we show that coinjection of Cas9 mRNA/sgRNAs with mutant oligos generated precise point mutations simultaneously in two target genes. Thus, the CRISPR/Cas system allows the one-step generation of animals carrying mutations in multiple genes, an approach that will greatly accelerate the in vivo study of functionally redundant genes and of epistatic gene interactions.
  • Article
    Cell death and differentiation is a monthly research journal focused on the exciting field of programmed cell death and apoptosis. It provides a single accessible source of information for both scientists and clinicians, keeping them up-to-date with advances in the field. It encompasses programmed cell death, cell death induced by toxic agents, differentiation and the interrelation of these with cell proliferation.
  • Article
    Full-text available
    Recent advances with the type II clustered regularly interspaced short palindromic repeats (CRISPR) system promise an improved approach to genome editing. However, the applicability and efficiency of this system in model organisms, such as zebrafish, are little studied. Here, we report that RNA-guided Cas9 nuclease efficiently facilitates genome editing in both mammalian cells and zebrafish embryos in a simple and robust manner. Over 35% of site-specific somatic mutations were found when specific Cas/gRNA was used to target either etsrp, gata4 or gata5 in zebrafish embryos in vivo. The Cas9/gRNA efficiently induced biallelic conversion of etsrp or gata5 in the resulting somatic cells, recapitulating their respective vessel phenotypes in etsrp(y11) mutant embryos or cardia bifida phenotypes in fau(tm236a) mutant embryos. Finally, we successfully achieved site-specific insertion of mloxP sequence induced by Cas9/gRNA system in zebrafish embryos. These results demonstrate that the Cas9/gRNA system has the potential of becoming a simple, robust and efficient reverse genetic tool for zebrafish and other model organisms. Together with other genome-engineering technologies, the Cas9 system is promising for applications in biology, agriculture, environmental studies and medicine.Cell Research advance online publication 26 March 2013; doi:10.1038/cr.2013.45.
  • Article
    Full-text available
    Targeted gene regulation on a genome-wide scale is a powerful strategy for interrogating, perturbing, and engineering cellular systems. Here, we develop a method for controlling gene expression based on Cas9, an RNA-guided DNA endonuclease from a type II CRISPR system. We show that a catalytically dead Cas9 lacking endonuclease activity, when coexpressed with a guide RNA, generates a DNA recognition complex that can specifically interfere with transcriptional elongation, RNA polymerase binding, or transcription factor binding. This system, which we call CRISPR interference (CRISPRi), can efficiently repress expression of targeted genes in Escherichia coli, with no detectable off-target effects. CRISPRi can be used to repress multiple target genes simultaneously, and its effects are reversible. We also show evidence that the system can be adapted for gene repression in mammalian cells. This RNA-guided DNA recognition platform provides a simple approach for selectively perturbing gene expression on a genome-wide scale.
  • Article
    Full-text available
    We employ the CRISPR-Cas system of Streptococcus pyogenes as programmable RNA-guided endonucleases (RGENs) to cleave DNA in a targeted manner for genome editing in human cells. We show that complexes of the Cas9 protein and artificial chimeric RNAs efficiently cleave two genomic sites and induce indels with frequencies of up to 33%.
  • Article
    Here we use the clustered, regularly interspaced, short palindromic repeats (CRISPR)-associated Cas9 endonuclease complexed with dual-RNAs to introduce precise mutations in the genomes of Streptococcus pneumoniae and Escherichia coli. The approach relies on dual-RNA:Cas9-directed cleavage at the targeted genomic site to kill unmutated cells and circumvents the need for selectable markers or counter-selection systems. We reprogram dual-RNA:Cas9 specificity by changing the sequence of short CRISPR RNA (crRNA) to make single- and multinucleotide changes carried on editing templates. Simultaneous use of two crRNAs enables multiplex mutagenesis. In S. pneumoniae, nearly 100% of cells that were recovered using our approach contained the desired mutation, and in E. coli, 65% that were recovered contained the mutation, when the approach was used in combination with recombineering. We exhaustively analyze dual-RNA:Cas9 target requirements to define the range of targetable sequences and show strategies for editing sites that do not meet these requirements, suggesting the versatility of this technique for bacterial genome engineering.
  • Article
    Full-text available
    In bacteria, foreign nucleic acids are silenced by clustered, regularly interspaced, short palindromic repeats (CRISPR)-CRISPR-associated (Cas) systems. Bacterial type II CRISPR systems have been adapted to create guide RNAs that direct site-specific DNA cleavage by the Cas9 endonuclease in cultured cells. Here we show that the CRISPR-Cas system functions in vivo to induce targeted genetic modifications in zebrafish embryos with efficiencies similar to those obtained using zinc finger nucleases and transcription activator-like effector nucleases.
  • Article
    Bacteria and archaea have evolved adaptive immune defenses, termed clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) systems, that use short RNA to direct degradation of foreign nucleic acids. Here, we engineer the type II bacterial CRISPR system to function with custom guide RNA (gRNA) in human cells. For the endogenous AAVS1 locus, we obtained targeting rates of 10 to 25% in 293T cells, 13 to 8% in K562 cells, and 2 to 4% in induced pluripotent stem cells. We show that this process relies on CRISPR components; is sequence-specific; and, upon simultaneous introduction of multiple gRNAs, can effect multiplex editing of target loci. We also compute a genome-wide resource of ~190 K unique gRNAs targeting ~40.5% of human exons. Our results establish an RNA-guided editing tool for facile, robust, and multiplexable human genome engineering.
  • Article
    Full-text available
    Functional elucidation of causal genetic variants and elements requires precise genome editing technologies. The type II prokaryotic CRISPR (clustered regularly interspaced short palindromic repeats)/Cas adaptive immune system has been shown to facilitate RNA-guided site-specific DNA cleavage. We engineered two different type II CRISPR/Cas systems and demonstrate that Cas9 nucleases can be directed by short RNAs to induce precise cleavage at endogenous genomic loci in human and mouse cells. Cas9 can also be converted into a nicking enzyme to facilitate homology-directed repair with minimal mutagenic activity. Lastly, multiple guide sequences can be encoded into a single CRISPR array to enable simultaneous editing of several sites within the mammalian genome, demonstrating easy programmability and wide applicability of the RNA-guided nuclease technology.