ArticlePDF Available

Sequence features associated with the cleavage efficiency of CRISPR/Cas9 system

Authors:

Abstract and Figures

The CRISPR-Cas9 system has recently emerged as a versatile tool for biological and medical research. In this system, a single guide RNA (sgRNA) directs the endonuclease Cas9 to a targeted DNA sequence for site-specific manipulation. In addition to this targeting function, the sgRNA has also been shown to play a role in activating the endonuclease activity of Cas9. This dual function of the sgRNA likely underlies observations that different sgRNAs have varying on-target activities. Currently, our understanding of the relationship between sequence features of sgRNAs and their on-target cleavage efficiencies remains limited, largely due to difficulties in assessing the cleavage capacity of a large number of sgRNAs. In this study, we evaluated the cleavage activities of 218 sgRNAs using in vitro Surveyor assays. We found that nucleotides at both PAM-distal and PAM-proximal regions of the sgRNA are significantly correlated with on-target efficiency. Furthermore, we also demonstrated that the genomic context of the targeted DNA, the GC percentage, and the secondary structure of sgRNA are critical factors contributing to cleavage efficiency. In summary, our study reveals important parameters for the design of sgRNAs with high on-target efficiencies, especially in the context of high throughput applications.
Content may be subject to copyright.
1
Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675
www.nature.com/scientificreports
Sequence features associated with
the cleavage eciency of CRISPR/
Cas9 system
Xiaoxi Liu1, Ayaka Homma1, Jamasb Sayadi1,2, Shu Yang3, Jun Ohashi4 & Toru Takumi1,5
The CRISPR-Cas9 system has recently emerged as a versatile tool for biological and medical research. In
this system, a single guide RNA (sgRNA) directs the endonuclease Cas9 to a targeted DNA sequence for
site-specic manipulation. In addition to this targeting function, the sgRNA has also been shown to play
a role in activating the endonuclease activity of Cas9. This dual function of the sgRNA likely underlies
observations that dierent sgRNAs have varying on-target activities. Currently, our understanding of
the relationship between sequence features of sgRNAs and their on-target cleavage eciencies remains
limited, largely due to diculties in assessing the cleavage capacity of a large number of sgRNAs. In
this study, we evaluated the cleavage activities of 218 sgRNAs using in vitro Surveyor assays. We found
that nucleotides at both PAM-distal and PAM-proximal regions of the sgRNA are signicantly correlated
with on-target eciency. Furthermore, we also demonstrated that the genomic context of the targeted
DNA, the GC percentage, and the secondary structure of sgRNA are critical factors contributing to
cleavage eciency. In summary, our study reveals important parameters for the design of sgRNAs with
high on-target eciencies, especially in the context of high throughput applications.
e clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR - associated protein (Cas) sys-
tem, an adaptive immune system found in many archaea and bacteria, has recently emerged as an ecient and
precise tool for genome engineering1–3. e system has been further repurposed to regulate gene expression by
transcriptional activation or repression4,5, modify the local chromatin epigenetic status of various loci6,7, and even
target single stranded RNA8. Rapid advances in CRISPR-Cas based technology such as these are transforming
biological research and hold tremendous potential for future therapeutic applications.
To date, three CRISPR-Cas subtypes have been classified in prokaryotes9. Among them, the type II
CRISPR-Cas system derived from Streptococcus pyogenes is the most commonly used based on its relative simplic-
ity10. In particular, the type II CRISPR system utilizes a single endonuclease protein Cas9 to induce DNA cleavage
while multiple proteins are required in other subtypes11. When coupled with Cas9, two non-coding RNAs: the
CRISPR associated RNA (crRNA), required for DNA targeting, and the trans-activating RNA (tracrRNA), neces-
sary for nuclease activity, are sucient to induce DNA cleavage. ese two RNAs can be fused as a chimeric single
guide RNA (sgRNA) and further cloned with Cas9 into an expression vector, allowing convenient and ecient
delivery of the whole system12,13.
To direct the Cas9 complex to a desired locus for genetic manipulation, a 20-nucleotide guide sequence
found within the sgRNA must be complementary to the target DNA14. In addition, a protospacer-adjacent motif
(PAM) (3 nucleotides NGG for SpCas9) sequence must be present in the targeted genomic locus. Once bound
to the target DNA, two nuclease domains in Cas9, HNH and RuvC, cleave the DNA strands complementary and
non-complementary to the guide sequence, leaving a blunt-ended DNA double strand break (DSB)15. us, in
theory, any specic 20 nt genomic sequence followed by a PAM can be targeted. e exibility of this RNA-guided
system enables researchers to perform genome editing for virtually any locus of interest in an easy and quick
manner by simply changing the sgRNA in the expression vector.
1RIKEN Brain Science Institute, Wako, Saitama, Japan. 2Harvard College, Cambridge, Massachusetts 02138, United
States. 3Department of Computer Science, University of British Columbia, Vancouver, Canada. 4Department of
Biological Sciences, Graduate School of Science, University of Tokyo, Bunkyo, Tokyo, Japan. 5Core Research for
Evolutional Science and Technology (CREST), Japan Science and Technology Agency (JST), 7 Gobancho, Chiyoda-ku,
Tokyo 102-0076, Japan. Correspondence and requests for materials should be addressed to T.T. (email: toru.
takumi@riken.jp)
Received: 28 September 2015
Accepted: 16 December 2015
Published: 27 January 2016
OPEN
www.nature.com/scientificreports/
2
Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675
Given the relatively short length of the guide sequence in the sgRNA (20 nt), targeting specicity has become
one major concern in using CRISPR-Cas9, and the o-target eects of the system have been extensively inves-
tigated16. It has been proposed that the 8–12 PAM-proximal bases, known as the seed sequence, determine tar-
geting specicity by making contacting with the arginine-rich bridge helix (BH) within the recognition (REC)
lobe of the Cas9 protein17; therefore, selecting sites predicted to have the most specic seed regions with the
fewest possible o target mismatches may be crucial to improving on-target eciency. In contrast, the PAM distal
sequence has been suggested to be less important for specicity, and mismatches in this region are more likely to
be tolerated.
Despite extensive research on o-target eects, only a limited number of studies have focused on analyzing
the on-target cleavage eciency of the sgRNA/Cas9 complex. It has been observed that the mutagenesis rate of
the CRISPR/Cas9 system varies greatly18. Further studies have implicated that on-target eciency of site-directed
mutation is highly dependent on the sgRNA given that sgRNAs targeting the same genomic locus show dierent
activities19. Moreover, several recent studies have attempted to identify sgRNA sequence determinants that may
underlie sgRNA cleavage activity20–24. Doench et al. evaluated the eciencies of a total of 1,841 sgRNAs in induc-
ing complete loss of a protein and demonstrated that the nucleotide composition at specic positions, especially
the one adjacent to the protospacer-adjacent motif (PAM), contributes to the activity of the sgRNA21. Based on
previous published datasets, Hu et al. analyzed the eects of sequence context on sgRNA eciency and generated
models that achieved reasonable predicative power in which the Area Under Curve (AUC) scores were greater
than 0.7 in Receiver Operating Characteristic (ROC)22. Despite such advances, a large fraction of inecient sgR-
NAs are still not predictable with current models, which emphasizes the need to further optimize the design
principle of sgRNAs. In addition, many sequence features that are highly likely to be relevant to sgRNA activity,
such as the genomic context of the targeted region and the stability of the secondary structure of the sgRNA, have
not yet been explored and incorporated into a statistical model. In this study, we comprehensively analyzed the
sequence features of sgRNAs and their eects on cleavage activity based on the Surveyor assay system.
Among the host of in vitro systems available to evaluate sgRNA performance, the Surveyor nuclease assay
is the most commonly used and reliable method. is assay utilizes an enzyme mismatch cleavage system in
which heteroduplex DNA with mismatches and indels are cleaved. However, despite its high reliability, the pro-
cedure is tedious and time-consuming: it usually takes 10 days from the design of sgRNA to obtain the nal assay
results. Additionally, it is dicult to multiplex the procedure since cell culture, transfection, and genomic PCR are
required for each individual sgRNA assay. Currently, systematic evaluations of sgRNA on-target eciency based
on Surveyor assay are still limited, especially using mammalian cell lines. In this study, we reported the evaluation
of the on-target activity for 218 sgRNAs based on the mouse Neuro2A cell line.
Figure 1. (a) Outline of the procedure of the current study. (b) Distribution of sgRNAs in various genomic
contexts represented by dierent colors. (c) Distribution of sgRNAs across chromosomes. (d) e average GC
percentages of sgRNAs. e error bar indicated the 95% condence intervals.
www.nature.com/scientificreports/
3
Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675
Results
We designed and successfully cloned 218 sgRNAs into expression vectors. e experimental design and proce-
dure are briey outlined in Fig.1a. e insertion of all guide sequences into the expression vectors was conrmed
using Sanger sequencing. Together, these 218 sgRNAs target 153 distinct genomic loci across 18 chromosomes
in the mouse genome. e sequences of all sgRNAs as well as detailed annotations including targeted genomic
locations, guanine-cytosine (GC) percentages, and genomic contexts are provided in Supplementary Table 1 and
illustrated in Fig.1(b–d). We then performed Surveyor assays using Neuro2A cell line to evaluate the on-target
eciency of these sgRNAs. Representative gel images of the Surveyor assays for seven sgRNA samples are shown
in Fig.2. For each assay, we included one negative control in which a pMax-GFP vector was used for transfection.
By comparing PCR bands amplied from the negative controls and sgRNA-transfected samples, we classied the
guide sequences in the sgRNAs as Surveyor positive sequences if their cleavage pattern was clearly visible in the
sgRNA-transfected samples. rough this analysis, a total of 129 sgRNAs (59%) were determined to be Surveyor
positive. Meanwhile, no cleavage was observed for 89 sgRNAs (41%).
Nucleotide preferences of high ecient sgRNA. We next set out to explore whether the nucleotide
composition of the sgRNAs aected the cleavage results. First, we separated the sgRNA sequences into two
groups: Surveyor positive and Surveyor negative. For each group, the occurrences and frequencies of nucleo-
tides (A, C, T, G) at each position were calculated (Table1). We then compared the nucleotide frequencies in
the Surveyor positive sequences with those in the Surveyor negative sequences. A heatmap was subsequently
generated to visualize the frequency change between the two groups (Fig.3). In Surveyor positive sequences, we
observed an elevated frequency of thymine (T) at positions 3 and 6, an increased frequency of cytosine (C) and
decreased frequency of adenine (A) at position 20, and a host of other nucleotide frequency changes compared
to Surveyor negative sequences.
We next asked whether these frequency changes are statistically meaningful or merely represent chance obser-
vations. Chi-square analysis was performed for each position of the guide sequence to test if the overall nucleotide
composition is dierent between Surveyor positive and negative sequences. Statistically signicant changes were
observed at positions 3 and 20 with P values of 0.031 and 0.022, respectively (Table1). Position 3 is located at
the PAM-distal region, while position 20 is the base immediately upstream of the PAM sequence. We further
calculated the permutation adjusted P value for each position based on 10,000 times randomization of the sample
labels. e associations of positions 3 and 20 were not signicant aer corrected by permutation test (permuta-
tion P value = 0.4762 and 0.371 respectively).
Evaluation of GC percentage dierence. Since the GC percentage of sgRNAs, particularly that of the
6 PAM-proximal bases, has been previously reported to be positively correlated with on-target eciency20,25,26,
we further examined the potential association between GC content and cleavage outcome. We calculated the
overall GC percentage for the whole guide sequence, as well as the GC percentages for positions from 1–6, 7–14,
and 15–20, in a sliding window manner. Finally, we conducted a Welch two-sample T test, non-parametric
Kolmogorov-Smirnov test and logistic regression analysis but did not observe any significant associations
(Supplementary Table 2).
Logistic regression analysis. As a follow-up to the Chi-square test, we tried to evaluate the eect of each
nucleotide on cleavage eciency through logistic regression analysis. e nucleotides in each position were coded
Figure 2. e representative Surveyor assay results of 7 independent sgRNAs. e ID indicates the
sgRNA ID shown in the supplementary Table 1. e minus sign indicates the negative control sample that
was transfected with pMaxGFP. e plus sign indicates the sample that was transfected with the PX330 vector
containing the sgRNA. Among 7 sgRNAs, 5 sgRNAs (No. 133, 149, 166, 174, 175) showed positive cleavage
activity and 2 (No. 135, 163) showed no cleavage activity.
www.nature.com/scientificreports/
4
Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675
as dummy variables, and the nucleotide that showed the lowest frequency change at each given position was set
as the baseline level. Similarly, the genomic contexts of the target sequence were also included in the regression
analysis where the intergenic region was set as the baseline level. In addition to the sequential features of the sgR-
NAs, we also evaluated the impacts of several structural features on cleavage eciency. We assessed the overall
secondary structure of each sgRNA measured as single minimum free energy (MFE). Additionally, we analyzed
the local secondary structure of the seed region and the eects of the guide sequence on tracrRNA structure. We
further speculated that the relationship between GC percentage and cleavage eciency is likely to be a non-linear
one, where sgRNAs with GC percentages that are too high or too low are unfavorable. us, we labeled sgRNAs
with GC percentages within the range of 40%–60% as “GC normal” and those with GC percentages below 40% or
above 60% as “GC abnormal.” ese variables were incorporated into the logistic regression model.
Aer logistic regression analysis, a total of 14 variables were found to be signicantly correlated with the
cleavage results, including the nucleotide present at 10 distinct positions, the condition of being targeted to a
promoter-transcription start site (TSS), having a normal range GC, as well as several features of the second-
ary structures of sgRNA. e position-dependent nucleotide P values are illustrated in Fig.4. e results of
signicantly associated variables are shown in Table2, and the complete logistic analysis results are listed in
Supplementary Table 3.
To evaluate the performance of the current model, we rst tried to examine how well the cleavage activities of
sgRNAs used in this study can be predicted by previous methods. We calculated “on-target scores” for our sgR-
NAs using the standalone Python soware proposed in Doench et al.s study21. To do so, we updated our sgRNA
sequences as the program requires a 30nt sequence including the anking sequence of the guide sequence. By
using the logistic regression, we found a positive correlation between the “on-target score” and the Surveyor cleav-
age result, though the P value is marginal at 0.07. We then assessed the ROC curve of the model tted based on
this score and calculate the area under the ROC curve (AUC) to be 0.57. As a comparison, the AUC score based
Position
Surveyor Positive Sequences Surveyor Negative Sequences
A C G T A C G T
P
value
P value
(perm)
135 (27.1%) 25 (19.4%) 43 (33.3%) 26 (20.2%) 26 (29.2%) 19 (21.3%) 23 (25.8%) 21 (23.6%) 0.694 1
233 (25.6%) 32 (24.8%) 33 (25.6%) 31 (24%) 24 (27%) 16 (18%) 21 (23.6%) 28 (31.5%) 0.51 1
333 (25.6%) 21 (16.3%) 24 (18.6%) 51 (39.5%) 22 (24.7%) 20 (22.5%) 27 (30.3%) 20 (22.5%) 0.031* 0.4762
420 (15.5%) 41 (31.8%) 45 (34.9%) 23 (17.8%) 21 (23.6%) 24 (27%) 23 (25.8%) 21 (23.6%) 0.213 0.9931
528 (21.7%) 33 (25.6%) 43 (33.3%) 25 (19.4%) 24 (27%) 21 (23.6%) 18 (20.2%) 26 (29.2%) 0.107 0.9048
632 (24.8%) 26 (20.2%) 29 (22.5%) 42 (32.6%) 32 (36%) 18 (20.2%) 24 (27%) 15 (16.9%) 0.054 0.6909
726 (20.2%) 32 (24.8%) 36 (27.9%) 35 (27.1%) 20 (22.5%) 21 (23.6%) 20 (22.5%) 28 (31.5%) 0.774 1
833 (25.6%) 28 (21.7%) 33 (25.6%) 35 (27.1%) 22 (24.7%) 19 (21.3%) 32 (36%) 16 (18%) 0.283 0.9991
933 (25.6%) 34 (26.4%) 37 (28.7%) 25 (19.4%) 19 (21.3%) 31 (34.8%) 19 (21.3%) 20 (22.5%) 0.39 0.9999
10 25 (19.4%) 28 (21.7%) 41 (31.8%) 35 (27.1%) 20 (22.5%) 27 (30.3%) 21 (23.6%) 21 (23.6%) 0.348 0.9996
11 33 (25.6%) 26 (20.2%) 35 (27.1%) 35 (27.1%) 12 (13.5%) 20 (22.5%) 27 (30.3%) 30 (33.7%) 0.185 0.9869
12 29 (22.5%) 34 (26.4%) 40 (31%) 26 (20.2%) 17 (19.1%) 24 (27%) 24 (27%) 24 (27%) 0.648 1
13 28 (21.7%) 37 (28.7%) 33 (25.6%) 31 (24%) 15 (16.9%) 26 (29.2%) 26 (29.2%) 22 (24.7%) 0.825 1
14 31 (24%) 37 (28.7%) 39 (30.2%) 22 (17.1%) 13 (14.6%) 36 (40.4%) 25 (28.1%) 15 (16.9%) 0.205 0.9923
15 16 (12.4%) 44 (34.1%) 39 (30.2%) 30 (23.3%) 15 (16.9%) 27 (30.3%) 32 (36%) 15 (16.9%) 0.468 1
16 30 (23.3%) 41 (31.8%) 30 (23.3%) 28 (21.7%) 16 (18%) 27 (30.3%) 24 (27%) 22 (24.7%) 0.745 1
17 38 (29.5%) 33 (25.6%) 33 (25.6%) 25 (19.4%) 16 (18%) 21 (23.6%) 27 (30.3%) 25 (28.1%) 0.167 0.9803
18 36 (27.9%) 37 (28.7%) 29 (22.5%) 27 (20.9%) 17 (19.1%) 24 (27%) 27 (30.3%) 21 (23.6%) 0.366 0.9997
19 38 (29.5%) 34 (26.4%) 40 (31%) 17 (13.2%) 22 (24.7%) 19 (21.3%) 25 (28.1%) 23 (25.8%) 0.126 0.9394
20 13 (10.1%) 63 (48.8%) 26 (20.2%) 27 (20.9%) 21 (23.6%) 29 (32.6%) 20 (22.5%) 19 (21.3%) 0.022* 0.371
21 44 (34.1%) 8 (6.2%) 20 (15.5%) 57 (44.2%) 31 (34.8%) 4 (4.5%) 20 (22.5%) 34 (38.2%) 0.545 1
Table 1. e occurrences and frequencies of nucleotides at each position in Surveyor positive and negative
sequences. P value (perm): P value obtained based on 10,000 times permutation test.
Figure 3. Heatmap plot showing nucleotide frequency change in each position of Surveyor positive
sgRNAs compared with that of Surveyor negative sgRNAs. e value of the color scale in each cell of the
heatmap indicates the nucleotide frequency dierence and is calculated as Frequencypositive Frequencynegative.
www.nature.com/scientificreports/
5
Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675
on our current logistic model is 0.91 and 0.67 when the tted model is applied to the total training data and from
a 20-fold cross-validation, respectively.
Discussion
In this study, we performed Surveyor assays to evaluate the on-target eciency of 218 sgRNA sequences. We
found that 41% of the sgRNAs showed no cleavage eects based on our assays. To understand what sequence
features inuence the cleavage outcome, we performed comprehensive statistical analyses that revealed the
position-dependent nucleotide preferences associated with positive cleavage results. We further revealed that the
genomic contexts of target DNA as well as the GC percentage and secondary structure of sgRNAs also contribute
to sgRNA performance. As such, these factors should be considered when designing guide sequences.
Based on Chi-square analysis, we found that position 3 and position 20, a base adjunct to the PAM, are associ-
ated with cleavage eciency. e signicant association at position 20 is in line with previous reports20,21, further
supporting the validity of our ndings. Studies on the crystal structure of CRISPR/Cas9 reveal that the nucleotide
at position 20 induces DNA double strand separation and is responsible for initiating R-loop formation27. Using
logistic regression analysis, we further revealed that the presence of an adenine at this position has a negative
impact on targeting eciency. Similarly, previous study has observed that possessing an adenine at position 20
resulted in a nearly 50% decrease in the cut rate26. Furthermore, other positions of the PAM-proximal seed region
were also found as signicant variables correlated with on-target eciency, which supports the importance of the
seed region for the proper functioning of the sgRNA/Cas9 complex.
In addition to the PAM-proximal region, we also observed signicant correlations between positions in the
PAM-distal region and cleavage eciency. Unlike the proximal region, the PAM-distal region has been consid-
ered less important in determining the sgRNA specicity. However, in our study, we show that this region may
actually contribute to the on-target eciencies of sgRNAs. At position 2 and 3, e T and G were found to have
a negative eect on cleavage eciency. Additionally, A at position 6 was identied as signicant nucleotide cor-
related with the cleavage outcome. It has been shown that the backbones of position 2 and 4–6 interact with the
REC1 domain of the sgRNA, which is critical for sgRNA:DNA recognition17. e nucleotides at these positions
might inuence this recognition process and thereby aect cleavage performance.
Figure 4. P-values of nucleotides from position 1 to 21 assessed by the logistic regression analysis. e
y-axis direction indicates whether a given nucleotide is favored or disfavored for cleavage activity.
Estimate Std. Error ZP value
Unpairing probability of guide sequence 9.054 2.505 3.614 0.0003
GC normal 3.143 0.950 3.311 0.0009
Pos_2_T 2.419 0.900 2.688 0.0072
Pos_3_G 2.464 0.847 2.911 0.0036
Pos_8_G 2.110 0.804 2.625 0.0087
Pos_17_G 2.419 0.814 2.970 0.0030
Context: promoter TSS 3.862 1.256 3.075 0.0021
Pos_6_A 2.049 0.898 2.281 0.0225
Pos_11_G 1.861 0.854 2.178 0.0294
Pos_14_A 2.535 1.007 2.518 0.0118
Pos_18_G 1.798 0.816 2.205 0.0275
Pos_19_A 1.785 0.768 2.326 0.0200
Pos_19_C 1.827 0.891 2.051 0.0403
Pos_20_A 2.216 0.959 2.311 0.0208
Table 2. Variables signicant associated with on-target eciency of sgRNA. TSS: transcription start site;
Estimate: estimated eect doses. Std.Error: Standard error; Pos is abbreviated for position.
www.nature.com/scientificreports/
6
Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675
Previous studies have shown that the GC percentage of overall sgRNAs26 and the 6 PAM-proximal nucleo-
tides25 are positively correlated with eciency in zebrash and Drosophila, respectively. However, in our study,
which is based on a mammalian cell line, we found that sgRNAs with very high or low GC contents are less eec-
tive rather than a simple model in which a greater GC percentage always lead to higher activity. In our logistic
regression analysis, we demonstrate that sgRNAs with a GC percentage within the range of 40%–60% are favored
for ecient on-target cleavage.
Intriguingly, we found that if a target DNA sequence is located at the promoter-TSS region, the sequence
has a greater chance to be successfully cleaved compared with a sequence located in the intergenic region. is
result is likely related to local chromatin accessibilities in dierent types of genomic loci. A recent genome-wide
Cas9 binding analysis based on Chip-Sequencing demonstrated that chromatin inaccessibility decrease the dCas9
binding and genomic loci commonly accessible in large number of cell types have a signicantly higher probabil-
ity binding to the sgRNA/Cas9 complex28,29.
Furthermore, our analysis also suggested that the secondary structure of the guide sequence is also an impor-
tant parameter that should be considered for designing sgRNAs. In particular, the probability of the seed region to
form an unfolded structure was identied as the most signicant feature. Counterintuitively, our analysis revealed
that if the seed sequence is more likely to form secondary structure, the sgRNA has a higher chance of cleaving
the target sequence. Loading of the guide RNA into Cas9 has been demonstrated to be a crucial step in converting
Cas9 into an active conformation capable of executing its nuclease function30. us, the secondary structure of
the seed region might have a role in facilitating the loading process and may potentially improve the cleavage
activity of Cas9.
Recently, two large-scale studies have been reported with the aim to improve the sgRNA design21,24. Our cur-
rent study diers from these two studies in several aspects and has unique advantages. In the rst study, Doench
et al. evaluated 1,841 sgRNAs’ eciency in inducing complete loss of the protein21. ese sgRNAs were designed
to target six cell surface marker genes. By FACS analysis using antibodies specic to these cell surface proteins, the
marker-negative cells were isolated and the sequencing was followed to determine highly active sgRNAs in these
cells. Given that sgRNAs targeting at intron or UTR regions are unlikely to aect the coding sequence, only sgR-
NAs targeting the coding sequences (CDS) were analyzed and were used to build the predictive model. However
this design has several potential limitations, for example, the sgRNA that induces in-frame mutation is unlikely to
be labeled as high-eective despite it may have a high cleavage eciency; additionally, if the frame-shi mutation
induced by the sgRNA occurred downstream of the epitope sites, the sgRNA might show less eects in abol-
ishing the recognition by antibody. In our study, we systemically designed sgRNAs targeting various loci with
dierent genomic contexts across the genome, and most importantly, rather than measuring the eects induced
by sgRNA, we directly measured the cleavage eciency of sgRNAs. In another study, 133 high-activity sgRNAs
and 146 low-activity sgRNAs for Cas9Sp together with 82 and 69 sgRNAs for Cas9St1 were determined and were
used to build the predictive model24. Since the support vector machine (SVM) model was adopted in this study,
it is dicult to compare the parameters with the current study. Despite dierences in the methodology and study
design, there was a striking similarity that the most dramatic nucleotide frequency changes were observed at
position 20 in all three studies. At this position, either C or G was found with an elevated frequency. e G/C
may be preferred to allow RNA/DNA hybridization and might be important for the initiation of the R-loop.
Furthermore, in the second study, a strong correlation was observed between the DNase I values of the targeting
sites and sgRNA eciency, supporting the locus accessibility is a critical determinant for the sgRNA activity.
Since the DNase I data was not available for the Neuro2A cell, we alternatively retrieved the DNaseI hypersensi-
tivity sites (DHS) of whole mouse brain available at ENCODE project31. We merged DHSs from an adult (week
8) and embryonic (day 14.5) mouse and used this collection to represent DHS sites specic to the brain. We
then examined how many sgRNA targets overlap with the DHS sites and found that a total of 47 sgRNAs out of
218 overlapped with the brain-specic DHSs. Among them, 32 sgRNAs were located in Promoter/TSS region.
Statistical analysis revealed a signicant positive correlation between being located at Promoter/TSS and being
located in DHS (P = 2.7 × 1011). is observation conrms that the promoter/TSS regions have a higher level
of chromatin accessibility. is link was further supported by a genome-wide survey of chromatin accessibility
of human genome using 125 diverse cell and tissue types, in which it was found that promoters typically exhibit
high accessibility across various cell types32. In our study we revealed that the secondary structure of the guide
sequence of sgRNA is associated with the on-target eciency and the inclusion of secondary structure variables
greatly improves the prediction power of the model. We showed that our logistic regression model performs rea-
sonably well. e detailed parameters of the model are provided and may prove valuable for future studies. e
full dataset is also available and can be used as a source for meta-analysis in future studies.
Although our study oers key insights into sgRNA design, attention should be paid for interpreting the results.
First, we used cleavage outcome data, which is binary in nature, for our statistical analysis. Although binary
responses are easy to understand and interpret, and by this criterion we can clearly separate the sgRNAs into 2
distinct groups, the eciencies of individual guide sequences might dier within the same group of sgRNAs that
showed positive cleavage results. us, quantitative outcomes such as cleavage percentage and number of muta-
tions induced by each sgRNA are needed to provide further insight into sgRNA optimization. Secondly, we used
800 ng plasmids for each transfection, which is commonly used for 24-well plate33. Based on in vivo mutagenesis
study of CRISPR/Cas9 in Drosophila, the protein level of Cas9 is unlikely to be a critical factor for mutagenesis
eciency, while the amount sgRNA has a more profound impact25. us, the sgRNA amount may need to be
optimized depending on specic experiment condition and cell type.
Conclusion
Here we report a systematic evaluation of on-target performance of 218 sgRNAs based on in vitro Surveyor
assay. We found that 41% of sgRNAs in our study showed negative results for cleavage, further emphasizing the
www.nature.com/scientificreports/
7
Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675
need to improve the design of the sgRNA. rough statistical analysis, we found that nucleotide preferences at
positions both adjunct and distal to the PAM sequence are signicantly correlated with on-target performance.
Furthermore, we showed that the genomic contexts of the target region, the optimal GC percentage, and second-
ary structure of sgRNA are important factors contributing to the cleavage eciency. Taken together, our study
reveals crucial parameters for the design of sgRNAs to achieve high on-target eciency, particularly in the con-
text of high throughput applications. Future studies are warranted to further replicate our study and improve the
state-of-the-art CRISPR/Cas9 technology.
Methods
Design and cloning of sgRNA. e sgRNAs were designed to target the anking sites of various loci that
harbor copy number variations (CNVs) associated with autism spectrum disorder (ASD). e top 100 most fre-
quently occurred ASD CNVs were retrieved from the SFARI CNV database34. We then used Ensembl Compara
API to determine the syntenic regions in the corresponding mouse genome. e sgRNAs were designed at the
anking sites of such mouse loci regardless their genomic contexts. e DNA sequences of selected regions were
obtained from the Ensembl database (GRCm38.p3) and were subsequently used as inputs for the CRISPR design
tool (http://crispr.mit.edu). en, candidate sgRNAs with the highest scores (generally indicating fewest potential
o-targets) were selected and synthesized. Two complementary oligonucleotides of sgRNAs were annealed, phos-
phorylated, and cloned into the BbsI sites of pX330 CRISPR/Cas9 vector (Addgene plasmid ID 42230).
Cell culture and transfection. Neuro2A (N2A) cells were cultured in Dulbecco’s modied Eagle’s Medium
(DMEM) supplemented with 10% fetal bovine serum (Life Technology), 100 units penicillin, and 100 μ g
Streptomycin (Nacalai) and incubated at 37 °C with 5% CO2. e cells were seeded into 24-well plates (FALCON)
to reach 1 × 105 cells per well. Plasmids (800 ng) were transfected using Lipo3000 reagents. N2A cells were har-
vested 48 hours post-transfection.
Surveyor assay. N2A cells transfected with both empty and sgRNA-containing PX330 vectors were treated
with buer containing proteinase K, and genomics DNA was then extracted by ethanol precipitation. Genomic
PCR was conducted to amplify a 400–700 bp region containing the sgRNA target. PCR products were gel puri-
ed with Wizard SV Gel and the PCR CleanUp kit (Promega). 800 ng of each puried PCR product was mixed
and re-annealed to form heteroduplexes, which were subsequently treated with SURVEYOR nuclease and
SURVEYOR enhancer S (Transgenomics) following the manufacturer’s recommended protocol. e nal product
was separated on a 3% TAE Agarose gel and stained with ethidium bromide.
Statistical Analysis. e R environment (version 3.1.3) was used for statistical analyses35. e two-sided
P value < 0.05 was regarded as the level of statistical signicance. Categorical variables were analyzed using
Chi-square test. Independent two-sample t-tests and Kolmogorov-Smirnov test were used in the comparison of
means between groups. Logistic regression was used to determine factors independently correlated with cleavage
eciency. To adjust for multiple testing, we further calculate permutation P values based on 10,000 times ran-
domization. In each cycle of the permutation test, 129 and 89 sgRNAs were randomly assigned as positive and
negative sequences, standard Chi-square test was followed and the smallest P value among all 21 positions was
recorded to construct an empirical frequency distribution of the smallest P values. Aer 10,000 repeats of this
procedure, the permutated P value is determined by comparing the original P value from the real data with the
empirical P value distribution. We used annotatePeaks.pl program from the Homer Chip-Seq soware to anno-
tate the genomic context of each sgRNA target36 based on the following categories: 3 UTR, Promoter-TSS, TTS
(Transcription termination site), 5 UTR, intron, exon and intergenic region. To evaluate the performance of the
logistic regression model, we performed Receiver operating characteristics (ROC) analysis in two settings. In the
rst setting, we trained the model using all samples and then examined how well the model can predict the cleav-
age results of the input samples. To prevent over-tting, in the second setting, we repeated the modeling based on
a 20-fold cross-validation (CV) and calculated the mean AUC value from the 20 times iteration.
Secondary structure analysis of sgRNAs. e MFE of each sgRNA was predicted using RNAfold with
the default parameters37. RNAplfold can compute local pair probabilities and has been used to model RNA
co-transcriptional folding by estimating the relative stabilities of all local structures based on a sliding window
approach38. As such, we used RNAplfold to assess the probability that the entire seed sequence is unpaired (i.e.
no folding structure) by scanning the seed region using a sliding window and averaging the probability over
all windows which contain the seed region. We set the window size W = 21 which is the length of the guide
sequence appended with an additional G used for U6 promoter (GN20), and U = 12 which is the length of the seed
sequence. Finally, we also estimated the eect of the guide sequence on the tracrRNA structure using the dot plot
of the base-pairing matrix predicted by RNAfold. In brief, for each nucleotide on the tracrRNA, we calculated
its maximum and average base pairing probability with nucleotides on the guide sequence from the base-pairing
matrix. We then averaged each individual probability over all nucleotides on tracrRNA and calculated the overall
probability that the tracrRNA structure interacts with the guide sequences.
References
1. Doudna, J. A. & Charpentier, E. Genome editing. e new frontier of genome engineering with CISP-Cas9. Science 346, 1258096,
doi: 10.1126/science.1258096 (2014).
2. Shalem, O., Sanjana, N. E. & Zhang, F. High-throughput functional genomics using CISP-Cas9. Nat ev Genet 16, 299–311, doi:
10.1038/nrg3899 (2015).
www.nature.com/scientificreports/
8
Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675
3. Sternberg, S. H. & Doudna, J. A. Expanding the Biologist’s Toolit with CISP-Cas9. Mol Cell 58, 568–574, doi: 10.1016/j.
molcel.2015.02.032 (2015).
4. onermann, S. et al. Genome-scale transcriptional activation by an engineered CISP-Cas9 complex. Nature 517, 583–588, doi:
10.1038/nature14136 (2015).
5. Gilbert, L. A. et al. CISP-mediated modular NA-guided regulation of transcription in euaryotes. Cell 154, 442–451, doi:
10.1016/j.cell.2013.06.044 (2013).
6. us, N. CISPs and epigenome editing. Nat Methods 11, 28 (2014).
7. Hilton, I. B. et al. Epigenome editing by a CISP-Cas9-based acetyltransferase activates genes from promoters and enhancers. Nat
Biotechnol 33, 510–517, doi: 10.1038/nbt.3199 (2015).
8. O’Connell, M. . et al. Programmable NA recognition and cleavage by CISP/Cas9. Nature 516, 263–266, doi: 10.1038/
nature13769 (2014).
9. Maarova, . S. et al. Evolution and classification of the CISP-Cas systems. Nat ev Microbiol 9, 467–477, doi: 10.1038/
nrmicro2577 (2011).
10. Hsu, P. D., Lander, E. S. & Zhang, F. Development and applications of CISP-Cas9 for genome engineering. Cell 157, 1262–1278,
doi: 10.1016/j.cell.2014.05.010 (2014).
11. Chylinsi, ., Maarova, . S., Charpentier, E. & oonin, E. V. Classication and evolution of type II CISP-Cas systems. Nucleic
Acids es 42, 6091–6105, doi: 10.1093/nar/gu241 (2014).
12. Jine, M. et al. A programmable dual-NA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816–821, doi:
10.1126/science.1225829 (2012).
13. abadi, A. M., Ousterout, D. G., Hilton, I. B. & Gersbach, C. A. Multiplex CISP/Cas9-based genome engineering from a single
lentiviral vector. Nucleic Acids es 42, e147, doi: 10.1093/nar/gu749 (2014).
14. Cong, L. et al. Multiplex Genome Engineering Using CISP/Cas Systems. Science 339, 819–823, doi: DOI 10.1126/science.1231143
(2013).
15. Gasiunas, G., Barrangou, ., Horvath, P. & Sisnys, V. Cas9-crNA ribonucleoprotein complex mediates specic DNA cleavage for
adaptive immunity in bacteria. Proc Natl Acad Sci USA 109, E2579–2586, doi: 10.1073/pnas.1208507109 (2012).
16. Pattanaya, V. et al. High-throughput proling of o-target DNA cleavage reveals NA-programmed Cas9 nuclease specicity. Nat
Biotechnol 31, 839–843, doi: 10.1038/nbt.2673 (2013).
17. Nishimasu, H. et al. Crystal structure of Cas9 in complex with guide NA and target DNA. Cell 156, 935–949, doi: 10.1016/j.
cell.2014.02.001 (2014).
18. Hsu, P. D. et al. DNA targeting specicity of NA-guided Cas9 nucleases. Nat Biotechnol 31, 827–832, doi: 10.1038/nbt.2647 (2013).
19. Mandal, P. . et al. Ecient ablation of genes in human hematopoietic stem and eector cells using CISP/Cas9. Cell Stem Cell 15,
643–652, doi: 10.1016/j.stem.2014.10.004 (2014).
20. Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CISP-Cas9 system. Science 343,
80–84, doi: 10.1126/science.1246981 (2014).
21. Doench, J. G. et al. ational design of highly active sgNAs for CISP-Cas9-mediated gene inactivation. Nat Biotechnol 32,
1262–1267, doi: 10.1038/nbt.3026 (2014).
22. Xu, H. et al. Sequence determinants of improved CISP sgNA design. Genome es 25, 1147–1157, doi: 10.1101/gr.191452.115
(2015).
23. Moreno-Mateos, M. A. et al. CISPscan: designing highly ecient sgNAs for CISP-Cas9 targeting in vivo. Nat Methods 12,
982–988, doi: 10.1038/nmeth.3543 (2015).
24. Chari, ., Mali, P., Moosburner, M. & Church, G. M. Unraveling CISP-Cas9 genome engineering parameters via a library-on-
library approach. Nat Methods 12, 823–826, doi: 10.1038/nmeth.3473 (2015).
25. en, X. et al. Enhanced specicity and eciency of the CISP/Cas9 system with optimized sgNA parameters in Drosophila. Cell
ep 9, 1151–1162, doi: 10.1016/j.celrep.2014.09.044 (2014).
26. Gagnon, J. A. et al. Ecient mutagenesis by Cas9 protein-mediated oligonucleotide insertion and large-scale assessment of single-
guide NAs. PLoS One 9, e98186, doi: 10.1371/journal.pone.0098186 (2014).
27. Anders, C., Niewoehner, O., Duerst, A. & Jine, M. Structural basis of PAM-dependent target DNA recognition by the Cas9
endonuclease. Nature 513, 569–573, doi: 10.1038/nature13579 (2014).
28. Singh, ., uscu, C., Quinlan, A., Qi, Y. & Adli, M. Cas9-chromatin binding information enables more accurate CISP o-target
prediction. Nucleic Acids es 43, e118, doi: 10.1093/nar/gv575 (2015).
29. Wu, X. et al. Genome-wide binding of the CISP endonuclease Cas9 in mammalian cells. Nat Biotechnol 32, 670–676, doi:
10.1038/nbt.2889 (2014).
30. Jine, M. et al. Structures of Cas9 endonucleases reveal NA-mediated conformational activation. Science 343, 1247997, doi:
10.1126/science.1247997 (2014).
31. Mouse, E. C. et al. An encyclopedia of mouse DNA elements (Mouse ENCODE). Genome Biol 13, 418, doi: 10.1186/gb-2012-13-8-
418 (2012).
32. urman, . E. et al. e accessible chromatin landscape of the human genome. Nature 489, 75–82, doi: 10.1038/nature11232
(2012).
33. Cong, L. et al. Multiplex genome engineering using CISP/Cas systems. Science 339, 819–823, doi: 10.1126/science.1231143
(2013).
34. Abrahams, B. S. et al. SFAI Gene 2.0: a community-driven nowledgebase for the autism spectrum disorders (ASDs). Mol Autis m
4, 36, doi: 10.1186/2040-2392-4-36 (2013).
35.  Core Team. : A language and environment for statistical computing.  Foundation for Statistical Computing, Vienna, Austria.
UL http://www.-project.org/ (2013).
36. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for
macrophage and B cell identities. Mol Cell 38, 576–589, doi: 10.1016/j.molcel.2010.05.004 (2010).
37. Lorenz, . et al. ViennaNA Pacage 2.0. Algorithms Mol Biol 6, 26, doi: 10.1186/1748-7188-6-26 (2011).
38. Li, X., Quon, G., Lipshitz, H. D. & Morris, Q. Predicting in vivo binding sites of NA-binding proteins using mNA secondary
structure. NA 16, 1096–1107, doi: 10.1261/rna.2017210 (2010).
Acknowledgements
We thank the excellent technical supports from stas of the Takumi laboratory. We appreciate Dr. Tomomi Aida
for critical reading the manuscript and insightful suggestions. is work was funded in part by KAKENHI, Japan
Society of Promotion of Science and Ministry of Education, Culture, Sports, Science, and Technology, Strategic
International Cooperative Program (SICP) and CREST, Japan Science and Technology Agency, Intramural
Research Grant for Neurological and Psychiatric Disorders of NCNP, and Takeda Pharmaceutical Co. Ltd.
www.nature.com/scientificreports/
9
Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675
Author Contributions
X.L. and T.T. conceived and designed the study. A.H. conducted the experiments. X.L., J.S., S.Y. and J.O.
performed the analyses. X.L. and J.S. draed the manuscript. All authors participated in the revision of the initial
manuscript and approved the nal manuscript.
Additional Information
Supplementary information accompanies this paper at http://www.nature.com/srep
Competing nancial interests: e authors declare no competing nancial interests.
How to cite this article: Liu, X. et al. Sequence features associated with the cleavage eciency of CRISPR/Cas9
system. Sci. Rep. 6, 19675; doi: 10.1038/srep19675 (2016).
is work is licensed under a Creative Commons Attribution 4.0 International License. e images
or other third party material in this article are included in the article’s Creative Commons license,
unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license,
users will need to obtain permission from the license holder to reproduce the material. To view a copy of this
license, visit http://creativecommons.org/licenses/by/4.0/

Supplementary resource (1)

... Similarly, genomic frameworks of the targeted DNA, secondary structure of sgRNA, and GC content (or guanine-cytosine content) are largely involved in determining cleavage efficiency. Thereby, there are some principal factors to design appropriate sgRNAs with high on-target activities (Liu et al., 2016). Once selecting a suitable sgRNA sequence, G is toughly favored and conversely, and C is intensely unfavorable as the first base is closely nearby the PAM. ...
... In terms of efficacy, studies have delivered robust evidence that sgRNAs with very high or low GC percentages are unflavored. The widespread logistic regression examinations have demonstrated that GC percentages within the range of 40%-60% are preferred for efficient on-target cleavage (Liu et al., 2016). Another study revealed that altering the sgRNA structure by spreading the duplex length (approximately 5 bp) and changing the fourth T of the continuous sequence of thymines to C or G considerably restored the knockout efficiency of the CRISPR/Cas9 system in TZM-bl and Jurkat cells (Dang et al., 2015). ...
... In fact, the heating and slow cooling of some gRNAs can lead to improved cleavage activity, providing further proof that the sgRNA secondary structure can modify its activity and suggesting that inactive sgRNAs can be restored by refolding them prior to transcription . Moreover, it was found that loading of the sgRNA into Cas9 is a crucial phase in adapting Cas9 into an active form to finally elicit its nuclease function (Liu et al., 2016). ...
Article
Full-text available
During recent years, clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) technologies have been noticed as a rapidly evolving tool to deliver a possibility for modifying target sequence expression and function. The CRISPR/Cas9 tool is currently being used to treat a myriad of human disorders, ranging from genetic diseases and infections to cancers. Preliminary reports have shown that CRISPR technology could result in valued consequences for the treatment of Duchenne muscular dystrophy (DMD), cystic fibrosis (CF), β-thalassemia, Huntington’s diseases (HD), etc. Nonetheless, high rates of off-target effects may hinder its application in clinics. Thereby, recent studies have focused on the finding of the novel strategies to ameliorate these off-target effects and thereby lead to a high rate of fidelity and accuracy in human, animals, prokaryotes, and also plants. Meanwhile, there is clear evidence indicating that the design of the specific sgRNA with high efficiency is of paramount importance. Correspondingly, elucidation of the principal parameters that contributed to determining the sgRNA efficiencies is a prerequisite. Herein, we will deliver an overview regarding the therapeutic application of CRISPR technology to treat human disorders. More importantly, we will discuss the potent influential parameters (e.g., sgRNA structure and feature) implicated in affecting the sgRNA efficacy in CRISPR/Cas9 technology, with special concentration on human and animal studies.
... Therefore, the GC bonds are more stable and have heat resistance than the AT bonds. The frequency of GC is preferable for RNA/DNA hybridization and is also important for initiation of the R-loop (Liu et al. 2016). ...
... In the CRISPR-Cas9 system, the GC contents of the PAM-proximal and PAM-distal regions affect the cleavage efficiency (Figure 2(a)). If these GC contents are exceedingly high or low, the Cas nuclease shows low cleavage effects, and most studies show higher efficiencies when the GC content is 40-60% (Liu et al. 2016;Malik et al. 2021). In addition, the presence of more than 56% GC content in the PAM-proximal seed region (1-12nt) further reduces the cleavage efficiency compared to the PAM-distal region (13-20nt) (Malik et al. 2021). ...
... The type of nucleotide according to the sgRNA spacer sequence position affects the cleavage efficiency, and this factor must be adequately considered to avoid low gene-editing efficiency. sgRNA activates the endonuclease activity of the Cas nuclease and can be programmed to guide Cas nucleases to the desired targets (Liu et al. 2016). For such programming, the sgRNA has a part called the spacer, which allows targeting and binding to specific sequences of genes close to the PAM (Briner et al. 2014). ...
Article
Full-text available
The CRISPR-Cas system stands out as a promising genome editing tool due to its cost-effectiveness and time efficiency compared to other methods. This system has tremendous potential for treating various diseases, including genetic disorders and cancer, and promotes therapeutic research for a wide range of genetic diseases. Additionally, the CRISPR-Cas system simplifies the generation of animal models, offering a more accessible alternative to traditional methods. The CRISPR-Cas9 system can be used to cleave target DNA strands that need to be corrected, causing double-strand breaks (DSBs). DNA with DSBs can then be recovered by the DNA repair pathway that the CRISPR-Cas9 system uses to edit target gene sequences. High cleavage efficiency of the CRISPR-Cas9 system is thus imperative for effective gene editing. Herein, we explore several factors affecting the cleavage efficiency of the CRISPR-Cas9 system. These factors include the GC content of the protospacer-adjacent motif (PAM) proximal and distal regions, single-guide RNA (sgRNA) properties, and chromatin state. These considerations contribute to the efficiency of genome editing.
... Further studies have revealed that the secondary structure of sgRNAs affects Cas9 targeting efficiency [25][26][27]. Hence, the design of sgRNAs is crucial to improve GT efficiency and should reflect the chromosomal environment of the target site as well as its nucleotide sequence [28][29][30]. ...
... It is hypothesized that the amount of sgRNA expression may cause a change in editing efficiency only after a certain threshold is reached [22,44]. Another possibility may be due to the ability of some sgRNAs to alter the chromosomal microenvironment or to compete effectively with active sgRNAs [29,[45][46][47]. The double sgRNAs construct of EMB-GFP-sg510 achieved a higher GT, while other combinations including EMB-GFP-sg10 showed lower editing efficiency. ...
Article
Full-text available
Background Precise gene targeting (GT) is a powerful tool for heritable precision genome engineering, enabling knock-in or replacement of the endogenous sequence via homologous recombination. We recently established a CRISPR/Cas9-mediated approach for heritable GT in Arabidopsis thaliana (Arabidopsis) and rice and reported that the double-strand breaks (DSBs) frequency of Cas9 influences the GT efficiency. However, the relationship between DSBs and GT at the same locus was not examined. Furthermore, it has never been investigated whether an increase in the number of copies of sgRNAs or the use of multiple sgRNAs would improve the efficiency of GT. Results Here, we achieved precise GT at endogenous loci Embryo Defective 2410 (EMB2410) and Repressor of Silencing 1 (ROS1) using the sequential transformation strategy and the combination of sgRNAs. We show that increasing of sgRNAs copy number elevates both DSBs and GT efficiency. On the other hand, application of multiple sgRNAs does not always enhance GT efficiency. Our results also suggested that some inefficient sgRNAs would play a role as a helper to facilitate other sgRNAs DSBs activity. Conclusions The results of this study clearly show that DSB efficiency, rather than mutation pattern, is one of the most important key factors determining GT efficiency. This study provides new insights into the relationship between sgRNAs, DSBs, and GTs and the molecular mechanisms of CRISPR/Cas9-mediated GTs in plants.
... Additionally, affinity of "seed sequence", which is the 10-12 nucleotides adjacent to PAM, strongly affects base pairing of gRNA to the target sequence that is a key role for CRISPR array designation [42]. As presented in target sequence in the bla CTX-M promoter, borderline G+C content at 25% might retard the binding affinity between the gRNA and the promoter region that causes incomplete pairing and loss of double strand break activity [43]. However, few spacer sequences were selected for CRISPR construction in this study which was a limitation for comparison of efficacy and specificity. ...
Article
Full-text available
Cluster regularly interspaced short palindromic repeats and CRISPR associated protein 9 (CRISPR-Cas9) is a promising tool for antimicrobial re-sensitization by inactivating antimicrobial resistance (AMR) genes of bacteria. Here, we programmed CRISPR-Cas9 with common spacers to target predominant blaCTX-M variants in group 1 and group 9 and their promoter in an Escherichia coli model. The CRISPR-Cas9 was delivered by non-replicative phagemid particles from a two-step process, including insertion of spacer in CRISPR and construction of phagemid vector. Spacers targeting blaCTX-M promoters and internal sequences of blaCTX-M group 1 (blaCTX-M-15 and -55) and group 9 (blaCTX-M-14, -27, -65, and -90) were cloned into pCRISPR and phagemid pRC319 for spacer evaluation and phagemid particle production. Re-sensitization and plasmid clearance were mediated by the spacers targeting internal sequences of each group, resulting in 3 log10 to 4 log10 reduction of the ratio of resistant cells, but not by those targeting the promoters. The CRISPR-Cas9 delivered by modified ΦRC319 particles were capable of re-sensitizing E. coli K-12 carrying either blaCTX-M group 1 or group 9 in a dose-dependent manner from 0.1 to 100 multiplicity of infection (MOI). In conclusion, CRISPR-Cas9 system programmed with well-designed spacers targeting multiple variants of AMR gene along with a phage-based delivery system could eliminate the widespread blaCTX-M genes for efficacy restoration of available third-generation cephalosporins by reversal of resistance in bacteria.
... By filtering, 19 candidate sgRNA sequences or RGENs are decreased (Table 2), finally reduced to 16 sgRNA sequences by employing additional evaluations on Mismatch "1" values in Mismatch 0, which confirms the target locus within the target region and proceeds by adding filter "0" in Mismatch 1 and 2 number of targets found outside or inside the target region. The guanine (G) and cytosine (C) content of sgRNA was based on Liu et al. 38 , who mentions that sgRNA efficacy is positively influenced by increasing GC content, but at the same time, increasing GC content decreases cleavage activity significantly. We established that the GC content of gRNA for this work should present a 40-60% range. ...
Article
Full-text available
Citrus fruits are the most nutritious foods widely used in flavoring, beverages, and medicines due to their outstanding curative effects. Sour orange (Citrus aurantium L.) is the predominant rootstock in most citrus growing areas due to its good agronomic attributes such as high quality, yield and tolerance to various pathogens. However, the citrus tristeza virus (CTV) is the leading epidemic agent of sour and sweet orange. This study aimed to design in silico guide RNA (sgRNA) for CRISPR/Cas9-mediated inactivation of the Nonexpression of Pathogenesis-Related genes 3 (NPR3) in sour orange (CaNPR3). The protein sequence of the CaNPR3 gene is 584 amino acid residues long. The amino acid sequence of the CaNPR3 gene was compared with the homologous sequences of other nearby vegetative species, showing a close similarity with Citrus sinensis and Citrus Clementina with 100% and 97.27%, respectively. CRISPR RGEN Tools provided 61 results for exon two of the CaNPR3 gene, filtering to 19 sequences and selecting four sgRNA sequences for genetic editing, which were: sgRNA 1 (5'-CATCAGGAAAAGACTTGAGT-3'), sgRNA 2 (5'-AGAACCTCAGACAACACACCTT-3'), sgRNA 3 (5'-CATCAGATTTGACCCTGGAT-3') and sgR-NA 4 (5'- TTCTGGAGGGAGGGAGAGAAATGAGGAGG -3'). The predicted secondary structures of the four selected sgRNAs present efficient structures for gene editing of the target gene, allowing it to recognize, interact with Cas9 protein and edit the target region. Keywords: Gene editing, guide RNA, CaNPR3, in silico.
... To avoid undesired genomic modifications in hiPSCs described to arise after gene editing [24][25][26][27], within this study, the gRNA was designed according to the established literature standards [28][29][30] and showed high quality when analyzed using the in silico tool Cas-OFFinder (Supplementary Table S2) [31,32]. Another source of genomic variability is the long-term cultivation of stem cells [33][34][35][36][37][38]. ...
Article
Full-text available
Genome editing, notably CRISPR (cluster regularly interspaced short palindromic repeats)/Cas9 (CRISPR-associated protein 9), has revolutionized genetic engineering allowing for precise targeted modifications. This technique’s combination with human induced pluripotent stem cells (hiPSCs) is a particularly valuable tool in cerebral organoid (CO) research. In this study, CRISPR/Cas9-generated fluorescently labeled hiPSCs exhibited no significant morphological or growth rate differences compared with unedited controls. However, genomic aberrations during gene editing necessitate efficient genome integrity assessment methods. Optical genome mapping, a high-resolution genome-wide technique, revealed genomic alterations, including chromosomal copy number gain and losses affecting numerous genes. Despite these genomic alterations, hiPSCs retain their pluripotency and capacity to generate COs without major phenotypic changes but one edited cell line showed potential neuroectodermal differentiation impairment. Thus, this study highlights optical genome mapping in assessing genome integrity in CRISPR/Cas9-edited hiPSCs emphasizing the need for comprehensive integration of genomic and morphological analysis to ensure the robustness of hiPSC-based models in cerebral organoid research.
... However, compared to the individual gRNA, the duplex gRNA driven by AtU6-26 promoter demonstrated lower efficiency (6.0 to 7.7%) in inducing indel mutations (Gao et al. 2015). Moreover, it is important to note that not all target sequences perform equally, as they produce different mutation efficiencies due to secondary structure factors and their respective GC content (Liu et al. 2016). Hence, the identification of native NtU6 promoters remains crucial to extend the CRISPR toolbox for tobacco breeding. ...
... Additionally, we found the mutation efficiency of target site sgRNA-1 and sgRNA-3 was low, different targets producing different mutation efficiencies perhaps due to G/C content and the locations of the designed sgRNAs. 46 Previous studies have shown that 35S and the ubiquitin promoters are typically used for the control of Cas9 expression, endogenic ubiquitin promoters can better drive Cas9 expression, 7,47 which is consistent with our results. We also found that CRISPR vectors constructed by Cas9-1 and Cas9-2 have different editing efficiencies, perhaps due to differences in codon preference between different species. ...
Article
Full-text available
The optimization of the CRISPR-Cas9 system for enhancing editing efficiency holds significant value in scientific research. In this study, we optimized single guide RNA and Cas9 promoters of the CRISPR-Cas9 vector and established an efficient protoplast isolation and transient transformation system in Eustoma grandiflorum, and we successfully applied the modified CRISPR-Cas9 system to detect editing efficiency of the EgPDS gene. The activity of the EgU6-2 promoter in E. grandiflorum protoplasts was approximately three times higher than that of the GmU6 promoter. This promoter, along with the EgUBQ10 promoter, was applied in the CRISPR-Cas9 cassette, the modified CRISPR-Cas9 vectors that pEgU6-2::sgRNA-2/pEgUBQ10::Cas9-2 editing efficiency was 37.7%, which was 30.3% higher than that of the control, and the types of mutation are base substitutions, small fragment deletions and insertions. Finally we obtained an efficient gene editing vector for E. grandiflorum. This project provides an important technical platform for the study of gene function in E. grandiflorum.
Article
Full-text available
CRISPR-Cas9 technology provides a powerful system for genome engineering. However, variable activity across different single guide RNAs (sgRNAs) remains a significant limitation. We analyzed the molecular features that influence sgRNA stability, activity and loading into Cas9 in vivo. We observed that guanine enrichment and adenine depletion increased sgRNA stability and activity, whereas differential sgRNA loading, nucleosome positioning and Cas9 off-target binding were not major determinants. We also identified sgRNAs truncated by one or two nucleotides and containing 5' mismatches as efficient alternatives to canonical sgRNAs. On the basis of these results, we created a predictive sgRNA-scoring algorithm, CRISPRscan, that effectively captures the sequence features affecting the activity of CRISPR-Cas9 in vivo. Finally, we show that targeting Cas9 to the germ line using a Cas9-nanos 3' UTR led to the generation of maternal-zygotic mutants, as well as increased viability and decreased somatic mutations. These results identify determinants that influence Cas9 activity and provide a framework for the design of highly efficient sgRNAs for genome targeting in vivo.
Article
Full-text available
We developed an in vivo library-on-library methodology to simultaneously assess single guide RNA (sgRNA) activity across ∼1,400 genomic loci. Assaying across multiple human cell types and end-processing enzymes as well as two Cas9 orthologs, we unraveled underlying nucleotide sequence and epigenetic parameters. Our results and software (http://crispr.med.harvard.edu/sgRNAScorer) enable improved design of reagents, shed light on mechanisms of genome targeting, and provide a generalizable framework to study nucleic acid-nucleic acid interactions and biochemistry in high throughput.
Article
Full-text available
The CRISPR/CAS9 system has revolutionized mammalian somatic cell genetics. Genome-wide functional screens employing CRISPR/Cas9-mediated knockout or dCas9 fusion-mediated inhibition/activation (CRISPRi/a) are powerful techniques for discovering phenotype-associated gene function. We systematically assessed the DNA sequence features that contribute to single guide RNA (sgRNA) efficiency in CRISPR-based screens. Leveraging the information from multiple designs, we derived a new sequence model for predicting sgRNA efficiency in CRISPR/Cas9 knockout experiments. Our model confirmed known features, and suggested new features including a preference for cytosine at the cleavage site. The model was experimentally validated for sgRNA-mediated mutation rate and protein knockout efficiency. Tested on independent datasets, the model achieved significant results in both positive and negative selection conditions, and outperformed existing models. We also found that the sequence preference for CRISPRi/a is substantially different from that for CRISPR/Cas9 knockout and propose a new model for predicting sgRNA efficiency in CRISPRi/a experiments. These results facilitate the genome-wide design of improved sgRNA for both knockout and CRISPRi/a studies. Published by Cold Spring Harbor Laboratory Press.
Article
Full-text available
The CRISPR system has become a powerful biological tool with a wide range of applications. However, improving targeting specificity and accurately predicting potential off-targets remains a significant goal. Here, we introduce a web-based CR: ISPR/Cas9 O: ff-target P: rediction and I: dentification T: ool (CROP-IT) that performs improved off-target binding and cleavage site predictions. Unlike existing prediction programs that solely use DNA sequence information; CROP-IT integrates whole genome level biological information from existing Cas9 binding and cleavage data sets. Utilizing whole-genome chromatin state information from 125 human cell types further enhances its computational prediction power. Comparative analyses on experimentally validated datasets show that CROP-IT outperforms existing computational algorithms in predicting both Cas9 binding as well as cleavage sites. With a user-friendly web-interface, CROP-IT outputs scored and ranked list of potential off-targets that enables improved guide RNA design and more accurate prediction of Cas9 binding or cleavage sites. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Article
Full-text available
Technologies that enable targeted manipulation of epigenetic marks could be used to precisely control cell phenotype or interrogate the relationship between the epigenome and transcriptional control. Here we describe a programmable, CRISPR-Cas9-based acetyltransferase consisting of the nuclease-null dCas9 protein fused to the catalytic core of the human acetyltransferase p300. The fusion protein catalyzes acetylation of histone H3 lysine 27 at its target sites, leading to robust transcriptional activation of target genes from promoters and both proximal and distal enhancers. Gene activation by the targeted acetyltransferase was highly specific across the genome. In contrast to previous dCas9-based activators, the acetyltransferase activates genes from enhancer regions and with an individual guide RNA. We also show that the core p300 domain can be fused to other programmable DNA-binding proteins. These results support targeted acetylation as a causal mechanism of transactivation and provide a robust tool for manipulating gene regulation.
Chapter
The CRISPR-Cas modules are adaptive antivirus immunity systems that are present in most archaea and many bacteria. These systems function by incorporating fragments of alien genomes into specific genomic loci, transcribing the inserts and using the transcripts as guide RNAs to destroy the genome of the cognate virus or plasmid. This RNA interference-like immune response is mediated by numerous, highly diverse Cas (CRISPR-associated) proteins, several of which form the Cascade complex involved in the processing of CRISPR loci transcripts and cleavage of the target DNA. Comparative analysis of the CRISPR-Cas modules led to the classification of the CRISPR-Cas systems into three types (I, II and III) that are characterized by distinct sets of cas genes. Classification of Cas proteins into families and superfamilies is a non-trivial task because of the fast evolution of many cas genes. Exhaustive sequence comparison aided by analysis of the available crystal structures led to the delineation of approximately 30 protein families that can be further classified into several superfamilies. By far the most common domain in Cas proteins is the RNA Recognition Motif (RRM). The RRM domains show remarkable diversity within the CRISPR-Cas systems and in particular comprise the scaffold of the Cascade complex. In addition to the numerous RRM domains, including a distinct polymerase-cyclase domain, the Cas proteins contain a distinct Superfamily II helicase domain, and several diverse nuclease domains. Detailed comparative analysis of the sequences and structures of Cas proteins structures shed light on the deep relationships between Type I and Type III systems and allowed us to propose a simple evolutionary scenario for the origin of CRISPR-Cas system. Moreover, combination of experimental structural studies and comparative analysis provides for detailed models of the structures of the Cascade complexes from different CRISPR-Cas types revealing remarkable architectural uniformity.
Article
Few discoveries transform a discipline overnight, but biologists today can manipulate cells in ways never possible before, thanks to a peculiar form of prokaryotic adaptive immunity mediated by clustered regularly interspaced short palindromic repeats (CRISPR). From elegant studies that deciphered how these immune systems function in bacteria, researchers quickly uncovered the technological potential of Cas9, an RNA-guided DNA cleaving enzyme, for genome engineering. Here we highlight the recent explosion in visionary applications of CRISPR-Cas9 that promises to usher in a new era of biological understanding and control. Copyright © 2015 Elsevier Inc. All rights reserved.
Article
Forward genetic screens are powerful tools for the discovery and functional annotation of genetic elements. Recently, the RNA-guided CRISPR (clustered regularly interspaced short palindromic repeat)-associated Cas9 nuclease has been combined with genome-scale guide RNA libraries for unbiased, phenotypic screening. In this Review, we describe recent advances using Cas9 for genome-scale screens, including knockout approaches that inactivate genomic loci and strategies that modulate transcriptional activity. We discuss practical aspects of screen design, provide comparisons with RNA interference (RNAi) screening, and outline future applications and challenges.