ArticlePDF Available

Sequence features associated with the cleavage efficiency of CRISPR/Cas9 system

January 2016
Scientific Reports 6(1):19675

January 2016
6(1):19675

DOI:10.1038/srep19675

License
CC BY 4.0

Authors:

The CRISPR-Cas9 system has recently emerged as a versatile tool for biological and medical research. In this system, a single guide RNA (sgRNA) directs the endonuclease Cas9 to a targeted DNA sequence for site-specific manipulation. In addition to this targeting function, the sgRNA has also been shown to play a role in activating the endonuclease activity of Cas9. This dual function of the sgRNA likely underlies observations that different sgRNAs have varying on-target activities. Currently, our understanding of the relationship between sequence features of sgRNAs and their on-target cleavage efficiencies remains limited, largely due to difficulties in assessing the cleavage capacity of a large number of sgRNAs. In this study, we evaluated the cleavage activities of 218 sgRNAs using in vitro Surveyor assays. We found that nucleotides at both PAM-distal and PAM-proximal regions of the sgRNA are significantly correlated with on-target efficiency. Furthermore, we also demonstrated that the genomic context of the targeted DNA, the GC percentage, and the secondary structure of sgRNA are critical factors contributing to cleavage efficiency. In summary, our study reveals important parameters for the design of sgRNAs with high on-target efficiencies, especially in the context of high throughput applications.

Outline of the procedure of the current study. (b) Distribution of sgRNAs in various genomic contexts represented by different colors. (c) Distribution of sgRNAs across chromosomes. (d) The average GC percentages of sgRNAs. The error bar indicated the 95% confidence intervals.

…

The representative Surveyor assay results of 7 independent sgRNAs.: The ID indicates the sgRNA ID shown in the supplementary Table 1. The minus sign indicates the negative control sample that was transfected with pMaxGFP. The plus sign indicates the sample that was transfected with the PX330 vector containing the sgRNA. Among 7 sgRNAs, 5 sgRNAs (No. 133, 149, 166, 174, 175) showed positive cleavage activity and 2 (No. 135, 163) showed no cleavage activity.

…

Heatmap plot showing nucleotide frequency change in each position of Surveyor positive sgRNAs compared with that of Surveyor negative sgRNAs.: The value of the color scale in each cell of the heatmap indicates the nucleotide frequency difference and is calculated as Frequencypositive − Frequencynegative.

…

P-values of nucleotides from position 1 to 21 assessed by the logistic regression analysis.: The y-axis direction indicates whether a given nucleotide is favored or disfavored for cleavage activity.

…

Figures - available via license: Creative Commons Attribution 4.0 International

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675

www.nature.com/scientificreports

Sequence features associated with

the cleavage eciency of CRISPR/

Cas9 system

Xiaoxi Liu1, Ayaka Homma1, Jamasb Sayadi1,2, Shu Yang3, Jun Ohashi4 & Toru Takumi1,5

The CRISPR-Cas9 system has recently emerged as a versatile tool for biological and medical research. In

this system, a single guide RNA (sgRNA) directs the endonuclease Cas9 to a targeted DNA sequence for

site-specic manipulation. In addition to this targeting function, the sgRNA has also been shown to play

a role in activating the endonuclease activity of Cas9. This dual function of the sgRNA likely underlies

observations that dierent sgRNAs have varying on-target activities. Currently, our understanding of

the relationship between sequence features of sgRNAs and their on-target cleavage eciencies remains

limited, largely due to diculties in assessing the cleavage capacity of a large number of sgRNAs. In

this study, we evaluated the cleavage activities of 218 sgRNAs using in vitro Surveyor assays. We found

that nucleotides at both PAM-distal and PAM-proximal regions of the sgRNA are signicantly correlated

with on-target eciency. Furthermore, we also demonstrated that the genomic context of the targeted

DNA, the GC percentage, and the secondary structure of sgRNA are critical factors contributing to

cleavage eciency. In summary, our study reveals important parameters for the design of sgRNAs with

high on-target eciencies, especially in the context of high throughput applications.

e clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR - associated protein (Cas) sys-

tem, an adaptive immune system found in many archaea and bacteria, has recently emerged as an ecient and

precise tool for genome engineering1–3. e system has been further repurposed to regulate gene expression by

transcriptional activation or repression4,5, modify the local chromatin epigenetic status of various loci6,7, and even

target single stranded RNA8. Rapid advances in CRISPR-Cas based technology such as these are transforming

biological research and hold tremendous potential for future therapeutic applications.

To date, three CRISPR-Cas subtypes have been classified in prokaryotes9. Among them, the type II

CRISPR-Cas system derived from Streptococcus pyogenes is the most commonly used based on its relative simplic-

ity10. In particular, the type II CRISPR system utilizes a single endonuclease protein Cas9 to induce DNA cleavage

while multiple proteins are required in other subtypes11. When coupled with Cas9, two non-coding RNAs: the

CRISPR associated RNA (crRNA), required for DNA targeting, and the trans-activating RNA (tracrRNA), neces-

sary for nuclease activity, are sucient to induce DNA cleavage. ese two RNAs can be fused as a chimeric single

guide RNA (sgRNA) and further cloned with Cas9 into an expression vector, allowing convenient and ecient

delivery of the whole system12,13.

To direct the Cas9 complex to a desired locus for genetic manipulation, a 20-nucleotide guide sequence

found within the sgRNA must be complementary to the target DNA14. In addition, a protospacer-adjacent motif

(PAM) (3 nucleotides NGG for SpCas9) sequence must be present in the targeted genomic locus. Once bound

to the target DNA, two nuclease domains in Cas9, HNH and RuvC, cleave the DNA strands complementary and

non-complementary to the guide sequence, leaving a blunt-ended DNA double strand break (DSB)15. us, in

theory, any specic 20 nt genomic sequence followed by a PAM can be targeted. e exibility of this RNA-guided

system enables researchers to perform genome editing for virtually any locus of interest in an easy and quick

manner by simply changing the sgRNA in the expression vector.

1RIKEN Brain Science Institute, Wako, Saitama, Japan. 2Harvard College, Cambridge, Massachusetts 02138, United

States. 3Department of Computer Science, University of British Columbia, Vancouver, Canada. 4Department of

Biological Sciences, Graduate School of Science, University of Tokyo, Bunkyo, Tokyo, Japan. 5Core Research for

Evolutional Science and Technology (CREST), Japan Science and Technology Agency (JST), 7 Gobancho, Chiyoda-ku,

Tokyo 102-0076, Japan. Correspondence and requests for materials should be addressed to T.T. (email: toru.

takumi@riken.jp)

Received: 28 September 2015

Accepted: 16 December 2015

Published: 27 January 2016

OPEN

www.nature.com/scientificreports/

Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675

Given the relatively short length of the guide sequence in the sgRNA (20 nt), targeting specicity has become

one major concern in using CRISPR-Cas9, and the o-target eects of the system have been extensively inves-

tigated16. It has been proposed that the 8–12 PAM-proximal bases, known as the seed sequence, determine tar-

geting specicity by making contacting with the arginine-rich bridge helix (BH) within the recognition (REC)

lobe of the Cas9 protein17; therefore, selecting sites predicted to have the most specic seed regions with the

fewest possible o target mismatches may be crucial to improving on-target eciency. In contrast, the PAM distal

sequence has been suggested to be less important for specicity, and mismatches in this region are more likely to

be tolerated.

Despite extensive research on o-target eects, only a limited number of studies have focused on analyzing

the on-target cleavage eciency of the sgRNA/Cas9 complex. It has been observed that the mutagenesis rate of

the CRISPR/Cas9 system varies greatly18. Further studies have implicated that on-target eciency of site-directed

mutation is highly dependent on the sgRNA given that sgRNAs targeting the same genomic locus show dierent

activities19. Moreover, several recent studies have attempted to identify sgRNA sequence determinants that may

underlie sgRNA cleavage activity20–24. Doench et al. evaluated the eciencies of a total of 1,841 sgRNAs in induc-

ing complete loss of a protein and demonstrated that the nucleotide composition at specic positions, especially

the one adjacent to the protospacer-adjacent motif (PAM), contributes to the activity of the sgRNA21. Based on

previous published datasets, Hu et al. analyzed the eects of sequence context on sgRNA eciency and generated

models that achieved reasonable predicative power in which the Area Under Curve (AUC) scores were greater

than 0.7 in Receiver Operating Characteristic (ROC)22. Despite such advances, a large fraction of inecient sgR-

NAs are still not predictable with current models, which emphasizes the need to further optimize the design

principle of sgRNAs. In addition, many sequence features that are highly likely to be relevant to sgRNA activity,

such as the genomic context of the targeted region and the stability of the secondary structure of the sgRNA, have

not yet been explored and incorporated into a statistical model. In this study, we comprehensively analyzed the

sequence features of sgRNAs and their eects on cleavage activity based on the Surveyor assay system.

Among the host of in vitro systems available to evaluate sgRNA performance, the Surveyor nuclease assay

is the most commonly used and reliable method. is assay utilizes an enzyme mismatch cleavage system in

which heteroduplex DNA with mismatches and indels are cleaved. However, despite its high reliability, the pro-

cedure is tedious and time-consuming: it usually takes 10 days from the design of sgRNA to obtain the nal assay

results. Additionally, it is dicult to multiplex the procedure since cell culture, transfection, and genomic PCR are

required for each individual sgRNA assay. Currently, systematic evaluations of sgRNA on-target eciency based

on Surveyor assay are still limited, especially using mammalian cell lines. In this study, we reported the evaluation

of the on-target activity for 218 sgRNAs based on the mouse Neuro2A cell line.

Figure 1. (a) Outline of the procedure of the current study. (b) Distribution of sgRNAs in various genomic

contexts represented by dierent colors. (c) Distribution of sgRNAs across chromosomes. (d) e average GC

percentages of sgRNAs. e error bar indicated the 95% condence intervals.

www.nature.com/scientificreports/

Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675

Results

We designed and successfully cloned 218 sgRNAs into expression vectors. e experimental design and proce-

dure are briey outlined in Fig.1a. e insertion of all guide sequences into the expression vectors was conrmed

using Sanger sequencing. Together, these 218 sgRNAs target 153 distinct genomic loci across 18 chromosomes

in the mouse genome. e sequences of all sgRNAs as well as detailed annotations including targeted genomic

locations, guanine-cytosine (GC) percentages, and genomic contexts are provided in Supplementary Table 1 and

illustrated in Fig.1(b–d). We then performed Surveyor assays using Neuro2A cell line to evaluate the on-target

eciency of these sgRNAs. Representative gel images of the Surveyor assays for seven sgRNA samples are shown

in Fig.2. For each assay, we included one negative control in which a pMax-GFP vector was used for transfection.

By comparing PCR bands amplied from the negative controls and sgRNA-transfected samples, we classied the

guide sequences in the sgRNAs as Surveyor positive sequences if their cleavage pattern was clearly visible in the

sgRNA-transfected samples. rough this analysis, a total of 129 sgRNAs (59%) were determined to be Surveyor

positive. Meanwhile, no cleavage was observed for 89 sgRNAs (41%).

Nucleotide preferences of high ecient sgRNA. We next set out to explore whether the nucleotide

composition of the sgRNAs aected the cleavage results. First, we separated the sgRNA sequences into two

groups: Surveyor positive and Surveyor negative. For each group, the occurrences and frequencies of nucleo-

tides (A, C, T, G) at each position were calculated (Table1). We then compared the nucleotide frequencies in

the Surveyor positive sequences with those in the Surveyor negative sequences. A heatmap was subsequently

generated to visualize the frequency change between the two groups (Fig.3). In Surveyor positive sequences, we

observed an elevated frequency of thymine (T) at positions 3 and 6, an increased frequency of cytosine (C) and

decreased frequency of adenine (A) at position 20, and a host of other nucleotide frequency changes compared

to Surveyor negative sequences.

We next asked whether these frequency changes are statistically meaningful or merely represent chance obser-

vations. Chi-square analysis was performed for each position of the guide sequence to test if the overall nucleotide

composition is dierent between Surveyor positive and negative sequences. Statistically signicant changes were

observed at positions 3 and 20 with P values of 0.031 and 0.022, respectively (Table1). Position 3 is located at

the PAM-distal region, while position 20 is the base immediately upstream of the PAM sequence. We further

calculated the permutation adjusted P value for each position based on 10,000 times randomization of the sample

labels. e associations of positions 3 and 20 were not signicant aer corrected by permutation test (permuta-

tion P value = 0.4762 and 0.371 respectively).

Evaluation of GC percentage dierence. Since the GC percentage of sgRNAs, particularly that of the

6 PAM-proximal bases, has been previously reported to be positively correlated with on-target eciency20,25,26,

we further examined the potential association between GC content and cleavage outcome. We calculated the

overall GC percentage for the whole guide sequence, as well as the GC percentages for positions from 1–6, 7–14,

and 15–20, in a sliding window manner. Finally, we conducted a Welch two-sample T test, non-parametric

Kolmogorov-Smirnov test and logistic regression analysis but did not observe any significant associations

(Supplementary Table 2).

Logistic regression analysis. As a follow-up to the Chi-square test, we tried to evaluate the eect of each

nucleotide on cleavage eciency through logistic regression analysis. e nucleotides in each position were coded

Figure 2. e representative Surveyor assay results of 7 independent sgRNAs. e ID indicates the

sgRNA ID shown in the supplementary Table 1. e minus sign indicates the negative control sample that

was transfected with pMaxGFP. e plus sign indicates the sample that was transfected with the PX330 vector

containing the sgRNA. Among 7 sgRNAs, 5 sgRNAs (No. 133, 149, 166, 174, 175) showed positive cleavage

activity and 2 (No. 135, 163) showed no cleavage activity.

www.nature.com/scientificreports/

Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675

as dummy variables, and the nucleotide that showed the lowest frequency change at each given position was set

as the baseline level. Similarly, the genomic contexts of the target sequence were also included in the regression

analysis where the intergenic region was set as the baseline level. In addition to the sequential features of the sgR-

NAs, we also evaluated the impacts of several structural features on cleavage eciency. We assessed the overall

secondary structure of each sgRNA measured as single minimum free energy (MFE). Additionally, we analyzed

the local secondary structure of the seed region and the eects of the guide sequence on tracrRNA structure. We

further speculated that the relationship between GC percentage and cleavage eciency is likely to be a non-linear

one, where sgRNAs with GC percentages that are too high or too low are unfavorable. us, we labeled sgRNAs

with GC percentages within the range of 40%–60% as “GC normal” and those with GC percentages below 40% or

above 60% as “GC abnormal.” ese variables were incorporated into the logistic regression model.

Aer logistic regression analysis, a total of 14 variables were found to be signicantly correlated with the

cleavage results, including the nucleotide present at 10 distinct positions, the condition of being targeted to a

promoter-transcription start site (TSS), having a normal range GC, as well as several features of the second-

ary structures of sgRNA. e position-dependent nucleotide P values are illustrated in Fig.4. e results of

signicantly associated variables are shown in Table2, and the complete logistic analysis results are listed in

Supplementary Table 3.

To evaluate the performance of the current model, we rst tried to examine how well the cleavage activities of

sgRNAs used in this study can be predicted by previous methods. We calculated “on-target scores” for our sgR-

NAs using the standalone Python soware proposed in Doench et al.’s study21. To do so, we updated our sgRNA

sequences as the program requires a 30nt sequence including the anking sequence of the guide sequence. By

using the logistic regression, we found a positive correlation between the “on-target score” and the Surveyor cleav-

age result, though the P value is marginal at 0.07. We then assessed the ROC curve of the model tted based on

this score and calculate the area under the ROC curve (AUC) to be 0.57. As a comparison, the AUC score based

Position

Surveyor Positive Sequences Surveyor Negative Sequences

A C G T A C G T

value

P value

(perm)

135 (27.1%) 25 (19.4%) 43 (33.3%) 26 (20.2%) 26 (29.2%) 19 (21.3%) 23 (25.8%) 21 (23.6%) 0.694 1

233 (25.6%) 32 (24.8%) 33 (25.6%) 31 (24%) 24 (27%) 16 (18%) 21 (23.6%) 28 (31.5%) 0.51 1

333 (25.6%) 21 (16.3%) 24 (18.6%) 51 (39.5%) 22 (24.7%) 20 (22.5%) 27 (30.3%) 20 (22.5%) 0.031* 0.4762

420 (15.5%) 41 (31.8%) 45 (34.9%) 23 (17.8%) 21 (23.6%) 24 (27%) 23 (25.8%) 21 (23.6%) 0.213 0.9931

528 (21.7%) 33 (25.6%) 43 (33.3%) 25 (19.4%) 24 (27%) 21 (23.6%) 18 (20.2%) 26 (29.2%) 0.107 0.9048

632 (24.8%) 26 (20.2%) 29 (22.5%) 42 (32.6%) 32 (36%) 18 (20.2%) 24 (27%) 15 (16.9%) 0.054 0.6909

726 (20.2%) 32 (24.8%) 36 (27.9%) 35 (27.1%) 20 (22.5%) 21 (23.6%) 20 (22.5%) 28 (31.5%) 0.774 1

833 (25.6%) 28 (21.7%) 33 (25.6%) 35 (27.1%) 22 (24.7%) 19 (21.3%) 32 (36%) 16 (18%) 0.283 0.9991

933 (25.6%) 34 (26.4%) 37 (28.7%) 25 (19.4%) 19 (21.3%) 31 (34.8%) 19 (21.3%) 20 (22.5%) 0.39 0.9999

10 25 (19.4%) 28 (21.7%) 41 (31.8%) 35 (27.1%) 20 (22.5%) 27 (30.3%) 21 (23.6%) 21 (23.6%) 0.348 0.9996

11 33 (25.6%) 26 (20.2%) 35 (27.1%) 35 (27.1%) 12 (13.5%) 20 (22.5%) 27 (30.3%) 30 (33.7%) 0.185 0.9869

12 29 (22.5%) 34 (26.4%) 40 (31%) 26 (20.2%) 17 (19.1%) 24 (27%) 24 (27%) 24 (27%) 0.648 1

13 28 (21.7%) 37 (28.7%) 33 (25.6%) 31 (24%) 15 (16.9%) 26 (29.2%) 26 (29.2%) 22 (24.7%) 0.825 1

14 31 (24%) 37 (28.7%) 39 (30.2%) 22 (17.1%) 13 (14.6%) 36 (40.4%) 25 (28.1%) 15 (16.9%) 0.205 0.9923

15 16 (12.4%) 44 (34.1%) 39 (30.2%) 30 (23.3%) 15 (16.9%) 27 (30.3%) 32 (36%) 15 (16.9%) 0.468 1

16 30 (23.3%) 41 (31.8%) 30 (23.3%) 28 (21.7%) 16 (18%) 27 (30.3%) 24 (27%) 22 (24.7%) 0.745 1

17 38 (29.5%) 33 (25.6%) 33 (25.6%) 25 (19.4%) 16 (18%) 21 (23.6%) 27 (30.3%) 25 (28.1%) 0.167 0.9803

18 36 (27.9%) 37 (28.7%) 29 (22.5%) 27 (20.9%) 17 (19.1%) 24 (27%) 27 (30.3%) 21 (23.6%) 0.366 0.9997

19 38 (29.5%) 34 (26.4%) 40 (31%) 17 (13.2%) 22 (24.7%) 19 (21.3%) 25 (28.1%) 23 (25.8%) 0.126 0.9394

20 13 (10.1%) 63 (48.8%) 26 (20.2%) 27 (20.9%) 21 (23.6%) 29 (32.6%) 20 (22.5%) 19 (21.3%) 0.022* 0.371

21 44 (34.1%) 8 (6.2%) 20 (15.5%) 57 (44.2%) 31 (34.8%) 4 (4.5%) 20 (22.5%) 34 (38.2%) 0.545 1

Table 1. e occurrences and frequencies of nucleotides at each position in Surveyor positive and negative

sequences. P value (perm): P value obtained based on 10,000 times permutation test.

Figure 3. Heatmap plot showing nucleotide frequency change in each position of Surveyor positive

sgRNAs compared with that of Surveyor negative sgRNAs. e value of the color scale in each cell of the

heatmap indicates the nucleotide frequency dierence and is calculated as Frequencypositive − Frequencynegative.

www.nature.com/scientificreports/

Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675

on our current logistic model is 0.91 and 0.67 when the tted model is applied to the total training data and from

a 20-fold cross-validation, respectively.

Discussion

In this study, we performed Surveyor assays to evaluate the on-target eciency of 218 sgRNA sequences. We

found that 41% of the sgRNAs showed no cleavage eects based on our assays. To understand what sequence

features inuence the cleavage outcome, we performed comprehensive statistical analyses that revealed the

position-dependent nucleotide preferences associated with positive cleavage results. We further revealed that the

genomic contexts of target DNA as well as the GC percentage and secondary structure of sgRNAs also contribute

to sgRNA performance. As such, these factors should be considered when designing guide sequences.

Based on Chi-square analysis, we found that position 3 and position 20, a base adjunct to the PAM, are associ-

ated with cleavage eciency. e signicant association at position 20 is in line with previous reports20,21, further

supporting the validity of our ndings. Studies on the crystal structure of CRISPR/Cas9 reveal that the nucleotide

at position 20 induces DNA double strand separation and is responsible for initiating R-loop formation27. Using

logistic regression analysis, we further revealed that the presence of an adenine at this position has a negative

impact on targeting eciency. Similarly, previous study has observed that possessing an adenine at position 20

resulted in a nearly 50% decrease in the cut rate26. Furthermore, other positions of the PAM-proximal seed region

were also found as signicant variables correlated with on-target eciency, which supports the importance of the

seed region for the proper functioning of the sgRNA/Cas9 complex.

In addition to the PAM-proximal region, we also observed signicant correlations between positions in the

PAM-distal region and cleavage eciency. Unlike the proximal region, the PAM-distal region has been consid-

ered less important in determining the sgRNA specicity. However, in our study, we show that this region may

actually contribute to the on-target eciencies of sgRNAs. At position 2 and 3, e T and G were found to have

a negative eect on cleavage eciency. Additionally, A at position 6 was identied as signicant nucleotide cor-

related with the cleavage outcome. It has been shown that the backbones of position 2 and 4–6 interact with the

REC1 domain of the sgRNA, which is critical for sgRNA:DNA recognition17. e nucleotides at these positions

might inuence this recognition process and thereby aect cleavage performance.

Figure 4. P-values of nucleotides from position 1 to 21 assessed by the logistic regression analysis. e

y-axis direction indicates whether a given nucleotide is favored or disfavored for cleavage activity.

Estimate Std. Error ZP value

Unpairing probability of guide sequence − 9.054 2.505 − 3.614 0.0003

GC normal 3.143 0.950 3.311 0.0009

Pos_2_T − 2.419 0.900 − 2.688 0.0072

Pos_3_G − 2.464 0.847 − 2.911 0.0036

Pos_8_G − 2.110 0.804 − 2.625 0.0087

Pos_17_G − 2.419 0.814 − 2.970 0.0030

Context: promoter TSS 3.862 1.256 3.075 0.0021

Pos_6_A − 2.049 0.898 − 2.281 0.0225

Pos_11_G − 1.861 0.854 − 2.178 0.0294

Pos_14_A 2.535 1.007 2.518 0.0118

Pos_18_G − 1.798 0.816 − 2.205 0.0275

Pos_19_A 1.785 0.768 2.326 0.0200

Pos_19_C 1.827 0.891 2.051 0.0403

Pos_20_A − 2.216 0.959 − 2.311 0.0208

Table 2. Variables signicant associated with on-target eciency of sgRNA. TSS: transcription start site;

Estimate: estimated eect doses. Std.Error: Standard error; Pos is abbreviated for position.

www.nature.com/scientificreports/

Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675

Previous studies have shown that the GC percentage of overall sgRNAs26 and the 6 PAM-proximal nucleo-

tides25 are positively correlated with eciency in zebrash and Drosophila, respectively. However, in our study,

which is based on a mammalian cell line, we found that sgRNAs with very high or low GC contents are less eec-

tive rather than a simple model in which a greater GC percentage always lead to higher activity. In our logistic

regression analysis, we demonstrate that sgRNAs with a GC percentage within the range of 40%–60% are favored

for ecient on-target cleavage.

Intriguingly, we found that if a target DNA sequence is located at the promoter-TSS region, the sequence

has a greater chance to be successfully cleaved compared with a sequence located in the intergenic region. is

result is likely related to local chromatin accessibilities in dierent types of genomic loci. A recent genome-wide

Cas9 binding analysis based on Chip-Sequencing demonstrated that chromatin inaccessibility decrease the dCas9

binding and genomic loci commonly accessible in large number of cell types have a signicantly higher probabil-

ity binding to the sgRNA/Cas9 complex28,29.

Furthermore, our analysis also suggested that the secondary structure of the guide sequence is also an impor-

tant parameter that should be considered for designing sgRNAs. In particular, the probability of the seed region to

form an unfolded structure was identied as the most signicant feature. Counterintuitively, our analysis revealed

that if the seed sequence is more likely to form secondary structure, the sgRNA has a higher chance of cleaving

the target sequence. Loading of the guide RNA into Cas9 has been demonstrated to be a crucial step in converting

Cas9 into an active conformation capable of executing its nuclease function30. us, the secondary structure of

the seed region might have a role in facilitating the loading process and may potentially improve the cleavage

activity of Cas9.

Recently, two large-scale studies have been reported with the aim to improve the sgRNA design21,24. Our cur-

rent study diers from these two studies in several aspects and has unique advantages. In the rst study, Doench

et al. evaluated 1,841 sgRNAs’ eciency in inducing complete loss of the protein21. ese sgRNAs were designed

to target six cell surface marker genes. By FACS analysis using antibodies specic to these cell surface proteins, the

marker-negative cells were isolated and the sequencing was followed to determine highly active sgRNAs in these

cells. Given that sgRNAs targeting at intron or UTR regions are unlikely to aect the coding sequence, only sgR-

NAs targeting the coding sequences (CDS) were analyzed and were used to build the predictive model. However

this design has several potential limitations, for example, the sgRNA that induces in-frame mutation is unlikely to

be labeled as high-eective despite it may have a high cleavage eciency; additionally, if the frame-shi mutation

induced by the sgRNA occurred downstream of the epitope sites, the sgRNA might show less eects in abol-

ishing the recognition by antibody. In our study, we systemically designed sgRNAs targeting various loci with

dierent genomic contexts across the genome, and most importantly, rather than measuring the eects induced

by sgRNA, we directly measured the cleavage eciency of sgRNAs. In another study, 133 high-activity sgRNAs

and 146 low-activity sgRNAs for Cas9Sp together with 82 and 69 sgRNAs for Cas9St1 were determined and were

used to build the predictive model24. Since the support vector machine (SVM) model was adopted in this study,

it is dicult to compare the parameters with the current study. Despite dierences in the methodology and study

design, there was a striking similarity that the most dramatic nucleotide frequency changes were observed at

position 20 in all three studies. At this position, either C or G was found with an elevated frequency. e G/C

may be preferred to allow RNA/DNA hybridization and might be important for the initiation of the R-loop.

Furthermore, in the second study, a strong correlation was observed between the DNase I values of the targeting

sites and sgRNA eciency, supporting the locus accessibility is a critical determinant for the sgRNA activity.

Since the DNase I data was not available for the Neuro2A cell, we alternatively retrieved the DNaseI hypersensi-

tivity sites (DHS) of whole mouse brain available at ENCODE project31. We merged DHSs from an adult (week

8) and embryonic (day 14.5) mouse and used this collection to represent DHS sites specic to the brain. We

then examined how many sgRNA targets overlap with the DHS sites and found that a total of 47 sgRNAs out of

218 overlapped with the brain-specic DHSs. Among them, 32 sgRNAs were located in Promoter/TSS region.

Statistical analysis revealed a signicant positive correlation between being located at Promoter/TSS and being

located in DHS (P = 2.7 × 10−11). is observation conrms that the promoter/TSS regions have a higher level

of chromatin accessibility. is link was further supported by a genome-wide survey of chromatin accessibility

of human genome using 125 diverse cell and tissue types, in which it was found that promoters typically exhibit

high accessibility across various cell types32. In our study we revealed that the secondary structure of the guide

sequence of sgRNA is associated with the on-target eciency and the inclusion of secondary structure variables

greatly improves the prediction power of the model. We showed that our logistic regression model performs rea-

sonably well. e detailed parameters of the model are provided and may prove valuable for future studies. e

full dataset is also available and can be used as a source for meta-analysis in future studies.

Although our study oers key insights into sgRNA design, attention should be paid for interpreting the results.

First, we used cleavage outcome data, which is binary in nature, for our statistical analysis. Although binary

responses are easy to understand and interpret, and by this criterion we can clearly separate the sgRNAs into 2

distinct groups, the eciencies of individual guide sequences might dier within the same group of sgRNAs that

showed positive cleavage results. us, quantitative outcomes such as cleavage percentage and number of muta-

tions induced by each sgRNA are needed to provide further insight into sgRNA optimization. Secondly, we used

800 ng plasmids for each transfection, which is commonly used for 24-well plate33. Based on in vivo mutagenesis

study of CRISPR/Cas9 in Drosophila, the protein level of Cas9 is unlikely to be a critical factor for mutagenesis

eciency, while the amount sgRNA has a more profound impact25. us, the sgRNA amount may need to be

optimized depending on specic experiment condition and cell type.

Conclusion

Here we report a systematic evaluation of on-target performance of 218 sgRNAs based on in vitro Surveyor

assay. We found that 41% of sgRNAs in our study showed negative results for cleavage, further emphasizing the

www.nature.com/scientificreports/

Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675

need to improve the design of the sgRNA. rough statistical analysis, we found that nucleotide preferences at

positions both adjunct and distal to the PAM sequence are signicantly correlated with on-target performance.

Furthermore, we showed that the genomic contexts of the target region, the optimal GC percentage, and second-

ary structure of sgRNA are important factors contributing to the cleavage eciency. Taken together, our study

reveals crucial parameters for the design of sgRNAs to achieve high on-target eciency, particularly in the con-

text of high throughput applications. Future studies are warranted to further replicate our study and improve the

state-of-the-art CRISPR/Cas9 technology.

Methods

Design and cloning of sgRNA. e sgRNAs were designed to target the anking sites of various loci that

harbor copy number variations (CNVs) associated with autism spectrum disorder (ASD). e top 100 most fre-

quently occurred ASD CNVs were retrieved from the SFARI CNV database34. We then used Ensembl Compara

API to determine the syntenic regions in the corresponding mouse genome. e sgRNAs were designed at the

anking sites of such mouse loci regardless their genomic contexts. e DNA sequences of selected regions were

obtained from the Ensembl database (GRCm38.p3) and were subsequently used as inputs for the CRISPR design

tool (http://crispr.mit.edu). en, candidate sgRNAs with the highest scores (generally indicating fewest potential

o-targets) were selected and synthesized. Two complementary oligonucleotides of sgRNAs were annealed, phos-

phorylated, and cloned into the BbsI sites of pX330 CRISPR/Cas9 vector (Addgene plasmid ID 42230).

Cell culture and transfection. Neuro2A (N2A) cells were cultured in Dulbecco’s modied Eagle’s Medium

(DMEM) supplemented with 10% fetal bovine serum (Life Technology), 100 units penicillin, and 100 μ g

Streptomycin (Nacalai) and incubated at 37 °C with 5% CO2. e cells were seeded into 24-well plates (FALCON)

to reach 1 × 105 cells per well. Plasmids (800 ng) were transfected using Lipo3000 reagents. N2A cells were har-

vested 48 hours post-transfection.

Surveyor assay. N2A cells transfected with both empty and sgRNA-containing PX330 vectors were treated

with buer containing proteinase K, and genomics DNA was then extracted by ethanol precipitation. Genomic

PCR was conducted to amplify a 400–700 bp region containing the sgRNA target. PCR products were gel puri-

ed with Wizard SV Gel and the PCR CleanUp kit (Promega). 800 ng of each puried PCR product was mixed

and re-annealed to form heteroduplexes, which were subsequently treated with SURVEYOR nuclease and

SURVEYOR enhancer S (Transgenomics) following the manufacturer’s recommended protocol. e nal product

was separated on a 3% TAE Agarose gel and stained with ethidium bromide.

Statistical Analysis. e R environment (version 3.1.3) was used for statistical analyses35. e two-sided

P value < 0.05 was regarded as the level of statistical signicance. Categorical variables were analyzed using

Chi-square test. Independent two-sample t-tests and Kolmogorov-Smirnov test were used in the comparison of

means between groups. Logistic regression was used to determine factors independently correlated with cleavage

eciency. To adjust for multiple testing, we further calculate permutation P values based on 10,000 times ran-

domization. In each cycle of the permutation test, 129 and 89 sgRNAs were randomly assigned as positive and

negative sequences, standard Chi-square test was followed and the smallest P value among all 21 positions was

recorded to construct an empirical frequency distribution of the smallest P values. Aer 10,000 repeats of this

procedure, the permutated P value is determined by comparing the original P value from the real data with the

empirical P value distribution. We used annotatePeaks.pl program from the Homer Chip-Seq soware to anno-

tate the genomic context of each sgRNA target36 based on the following categories: 3′ UTR, Promoter-TSS, TTS

(Transcription termination site), 5′ UTR, intron, exon and intergenic region. To evaluate the performance of the

logistic regression model, we performed Receiver operating characteristics (ROC) analysis in two settings. In the

rst setting, we trained the model using all samples and then examined how well the model can predict the cleav-

age results of the input samples. To prevent over-tting, in the second setting, we repeated the modeling based on

a 20-fold cross-validation (CV) and calculated the mean AUC value from the 20 times iteration.

Secondary structure analysis of sgRNAs. e MFE of each sgRNA was predicted using RNAfold with

the default parameters37. RNAplfold can compute local pair probabilities and has been used to model RNA

co-transcriptional folding by estimating the relative stabilities of all local structures based on a sliding window

approach38. As such, we used RNAplfold to assess the probability that the entire seed sequence is unpaired (i.e.

no folding structure) by scanning the seed region using a sliding window and averaging the probability over

all windows which contain the seed region. We set the window size W = 21 which is the length of the guide

sequence appended with an additional G used for U6 promoter (GN20), and U = 12 which is the length of the seed

sequence. Finally, we also estimated the eect of the guide sequence on the tracrRNA structure using the dot plot

of the base-pairing matrix predicted by RNAfold. In brief, for each nucleotide on the tracrRNA, we calculated

its maximum and average base pairing probability with nucleotides on the guide sequence from the base-pairing

matrix. We then averaged each individual probability over all nucleotides on tracrRNA and calculated the overall

probability that the tracrRNA structure interacts with the guide sequences.

References

1. Doudna, J. A. & Charpentier, E. Genome editing. e new frontier of genome engineering with CISP-Cas9. Science 346, 1258096,

doi: 10.1126/science.1258096 (2014).

2. Shalem, O., Sanjana, N. E. & Zhang, F. High-throughput functional genomics using CISP-Cas9. Nat ev Genet 16, 299–311, doi:

10.1038/nrg3899 (2015).

www.nature.com/scientificreports/

Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675

3. Sternberg, S. H. & Doudna, J. A. Expanding the Biologist’s Toolit with CISP-Cas9. Mol Cell 58, 568–574, doi: 10.1016/j.

molcel.2015.02.032 (2015).

4. onermann, S. et al. Genome-scale transcriptional activation by an engineered CISP-Cas9 complex. Nature 517, 583–588, doi:

10.1038/nature14136 (2015).

5. Gilbert, L. A. et al. CISP-mediated modular NA-guided regulation of transcription in euaryotes. Cell 154, 442–451, doi:

10.1016/j.cell.2013.06.044 (2013).

6. us, N. CISPs and epigenome editing. Nat Methods 11, 28 (2014).

7. Hilton, I. B. et al. Epigenome editing by a CISP-Cas9-based acetyltransferase activates genes from promoters and enhancers. Nat

Biotechnol 33, 510–517, doi: 10.1038/nbt.3199 (2015).

8. O’Connell, M. . et al. Programmable NA recognition and cleavage by CISP/Cas9. Nature 516, 263–266, doi: 10.1038/

nature13769 (2014).

9. Maarova, . S. et al. Evolution and classification of the CISP-Cas systems. Nat ev Microbiol 9, 467–477, doi: 10.1038/

nrmicro2577 (2011).

10. Hsu, P. D., Lander, E. S. & Zhang, F. Development and applications of CISP-Cas9 for genome engineering. Cell 157, 1262–1278,

doi: 10.1016/j.cell.2014.05.010 (2014).

11. Chylinsi, ., Maarova, . S., Charpentier, E. & oonin, E. V. Classication and evolution of type II CISP-Cas systems. Nucleic

Acids es 42, 6091–6105, doi: 10.1093/nar/gu241 (2014).

12. Jine, M. et al. A programmable dual-NA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816–821, doi:

10.1126/science.1225829 (2012).

13. abadi, A. M., Ousterout, D. G., Hilton, I. B. & Gersbach, C. A. Multiplex CISP/Cas9-based genome engineering from a single

lentiviral vector. Nucleic Acids es 42, e147, doi: 10.1093/nar/gu749 (2014).

14. Cong, L. et al. Multiplex Genome Engineering Using CISP/Cas Systems. Science 339, 819–823, doi: DOI 10.1126/science.1231143

(2013).

15. Gasiunas, G., Barrangou, ., Horvath, P. & Sisnys, V. Cas9-crNA ribonucleoprotein complex mediates specic DNA cleavage for

adaptive immunity in bacteria. Proc Natl Acad Sci USA 109, E2579–2586, doi: 10.1073/pnas.1208507109 (2012).

16. Pattanaya, V. et al. High-throughput proling of o-target DNA cleavage reveals NA-programmed Cas9 nuclease specicity. Nat

Biotechnol 31, 839–843, doi: 10.1038/nbt.2673 (2013).

17. Nishimasu, H. et al. Crystal structure of Cas9 in complex with guide NA and target DNA. Cell 156, 935–949, doi: 10.1016/j.

cell.2014.02.001 (2014).

18. Hsu, P. D. et al. DNA targeting specicity of NA-guided Cas9 nucleases. Nat Biotechnol 31, 827–832, doi: 10.1038/nbt.2647 (2013).

19. Mandal, P. . et al. Ecient ablation of genes in human hematopoietic stem and eector cells using CISP/Cas9. Cell Stem Cell 15,

643–652, doi: 10.1016/j.stem.2014.10.004 (2014).

20. Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CISP-Cas9 system. Science 343,

80–84, doi: 10.1126/science.1246981 (2014).

21. Doench, J. G. et al. ational design of highly active sgNAs for CISP-Cas9-mediated gene inactivation. Nat Biotechnol 32,

1262–1267, doi: 10.1038/nbt.3026 (2014).

22. Xu, H. et al. Sequence determinants of improved CISP sgNA design. Genome es 25, 1147–1157, doi: 10.1101/gr.191452.115

(2015).

23. Moreno-Mateos, M. A. et al. CISPscan: designing highly ecient sgNAs for CISP-Cas9 targeting in vivo. Nat Methods 12,

982–988, doi: 10.1038/nmeth.3543 (2015).

24. Chari, ., Mali, P., Moosburner, M. & Church, G. M. Unraveling CISP-Cas9 genome engineering parameters via a library-on-

library approach. Nat Methods 12, 823–826, doi: 10.1038/nmeth.3473 (2015).

25. en, X. et al. Enhanced specicity and eciency of the CISP/Cas9 system with optimized sgNA parameters in Drosophila. Cell

ep 9, 1151–1162, doi: 10.1016/j.celrep.2014.09.044 (2014).

26. Gagnon, J. A. et al. Ecient mutagenesis by Cas9 protein-mediated oligonucleotide insertion and large-scale assessment of single-

guide NAs. PLoS One 9, e98186, doi: 10.1371/journal.pone.0098186 (2014).

27. Anders, C., Niewoehner, O., Duerst, A. & Jine, M. Structural basis of PAM-dependent target DNA recognition by the Cas9

endonuclease. Nature 513, 569–573, doi: 10.1038/nature13579 (2014).

28. Singh, ., uscu, C., Quinlan, A., Qi, Y. & Adli, M. Cas9-chromatin binding information enables more accurate CISP o-target

prediction. Nucleic Acids es 43, e118, doi: 10.1093/nar/gv575 (2015).

29. Wu, X. et al. Genome-wide binding of the CISP endonuclease Cas9 in mammalian cells. Nat Biotechnol 32, 670–676, doi:

10.1038/nbt.2889 (2014).

30. Jine, M. et al. Structures of Cas9 endonucleases reveal NA-mediated conformational activation. Science 343, 1247997, doi:

10.1126/science.1247997 (2014).

31. Mouse, E. C. et al. An encyclopedia of mouse DNA elements (Mouse ENCODE). Genome Biol 13, 418, doi: 10.1186/gb-2012-13-8-

418 (2012).

32. urman, . E. et al. e accessible chromatin landscape of the human genome. Nature 489, 75–82, doi: 10.1038/nature11232

(2012).

33. Cong, L. et al. Multiplex genome engineering using CISP/Cas systems. Science 339, 819–823, doi: 10.1126/science.1231143

(2013).

34. Abrahams, B. S. et al. SFAI Gene 2.0: a community-driven nowledgebase for the autism spectrum disorders (ASDs). Mol Autis m

4, 36, doi: 10.1186/2040-2392-4-36 (2013).

35.  Core Team. : A language and environment for statistical computing.  Foundation for Statistical Computing, Vienna, Austria.

UL http://www.-project.org/ (2013).

36. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for

macrophage and B cell identities. Mol Cell 38, 576–589, doi: 10.1016/j.molcel.2010.05.004 (2010).

37. Lorenz, . et al. ViennaNA Pacage 2.0. Algorithms Mol Biol 6, 26, doi: 10.1186/1748-7188-6-26 (2011).

38. Li, X., Quon, G., Lipshitz, H. D. & Morris, Q. Predicting in vivo binding sites of NA-binding proteins using mNA secondary

structure. NA 16, 1096–1107, doi: 10.1261/rna.2017210 (2010).

Acknowledgements

We thank the excellent technical supports from stas of the Takumi laboratory. We appreciate Dr. Tomomi Aida

for critical reading the manuscript and insightful suggestions. is work was funded in part by KAKENHI, Japan

Society of Promotion of Science and Ministry of Education, Culture, Sports, Science, and Technology, Strategic

International Cooperative Program (SICP) and CREST, Japan Science and Technology Agency, Intramural

Research Grant for Neurological and Psychiatric Disorders of NCNP, and Takeda Pharmaceutical Co. Ltd.

www.nature.com/scientificreports/

Scientific RepoRts | 6:19675 | DOI: 10.1038/srep19675

Author Contributions

X.L. and T.T. conceived and designed the study. A.H. conducted the experiments. X.L., J.S., S.Y. and J.O.

performed the analyses. X.L. and J.S. draed the manuscript. All authors participated in the revision of the initial

manuscript and approved the nal manuscript.

Additional Information

Supplementary information accompanies this paper at http://www.nature.com/srep

Competing nancial interests: e authors declare no competing nancial interests.

How to cite this article: Liu, X. et al. Sequence features associated with the cleavage eciency of CRISPR/Cas9

system. Sci. Rep. 6, 19675; doi: 10.1038/srep19675 (2016).

is work is licensed under a Creative Commons Attribution 4.0 International License. e images

or other third party material in this article are included in the article’s Creative Commons license,

unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license,

users will need to obtain permission from the license holder to reproduce the material. To view a copy of this

license, visit http://creativecommons.org/licenses/by/4.0/

Supplementary Information

Data

January 2016

Xiaoxi Liu · Ayaka Homma · Jamasb Sayadi · Shu Yang · Toru Takumi

Optimizing sgRNA to Improve CRISPR/Cas9 Knockout Efficiency: Special Focus on Human and Animal Cell

Article

Full-text available

Nov 2021

During recent years, clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 (Cas9) technologies have been noticed as a rapidly evolving tool to deliver a possibility for modifying target sequence expression and function. The CRISPR/Cas9 tool is currently being used to treat a myriad of human disorders, ranging from genetic diseases and infections to cancers. Preliminary reports have shown that CRISPR technology could result in valued consequences for the treatment of Duchenne muscular dystrophy (DMD), cystic fibrosis (CF), β-thalassemia, Huntington’s diseases (HD), etc. Nonetheless, high rates of off-target effects may hinder its application in clinics. Thereby, recent studies have focused on the finding of the novel strategies to ameliorate these off-target effects and thereby lead to a high rate of fidelity and accuracy in human, animals, prokaryotes, and also plants. Meanwhile, there is clear evidence indicating that the design of the specific sgRNA with high efficiency is of paramount importance. Correspondingly, elucidation of the principal parameters that contributed to determining the sgRNA efficiencies is a prerequisite. Herein, we will deliver an overview regarding the therapeutic application of CRISPR technology to treat human disorders. More importantly, we will discuss the potent influential parameters (e.g., sgRNA structure and feature) implicated in affecting the sgRNA efficacy in CRISPR/Cas9 technology, with special concentration on human and animal studies.

Factors affecting the cleavage efficiency of the CRISPR-Cas9 system

Article

Full-text available

Mar 2024

The CRISPR-Cas system stands out as a promising genome editing tool due to its cost-effectiveness and time efficiency compared to other methods. This system has tremendous potential for treating various diseases, including genetic disorders and cancer, and promotes therapeutic research for a wide range of genetic diseases. Additionally, the CRISPR-Cas system simplifies the generation of animal models, offering a more accessible alternative to traditional methods. The CRISPR-Cas9 system can be used to cleave target DNA strands that need to be corrected, causing double-strand breaks (DSBs). DNA with DSBs can then be recovered by the DNA repair pathway that the CRISPR-Cas9 system uses to edit target gene sequences. High cleavage efficiency of the CRISPR-Cas9 system is thus imperative for effective gene editing. Herein, we explore several factors affecting the cleavage efficiency of the CRISPR-Cas9 system. These factors include the GC content of the protospacer-adjacent motif (PAM) proximal and distal regions, single-guide RNA (sgRNA) properties, and chromatin state. These considerations contribute to the efficiency of genome editing.

Application of multiple sgRNAs boosts efficiency of CRISPR/Cas9-mediated gene targeting in Arabidopsis

Article

Full-text available

Jan 2024
BMC BIOL

Background Precise gene targeting (GT) is a powerful tool for heritable precision genome engineering, enabling knock-in or replacement of the endogenous sequence via homologous recombination. We recently established a CRISPR/Cas9-mediated approach for heritable GT in Arabidopsis thaliana (Arabidopsis) and rice and reported that the double-strand breaks (DSBs) frequency of Cas9 influences the GT efficiency. However, the relationship between DSBs and GT at the same locus was not examined. Furthermore, it has never been investigated whether an increase in the number of copies of sgRNAs or the use of multiple sgRNAs would improve the efficiency of GT. Results Here, we achieved precise GT at endogenous loci Embryo Defective 2410 (EMB2410) and Repressor of Silencing 1 (ROS1) using the sequential transformation strategy and the combination of sgRNAs. We show that increasing of sgRNAs copy number elevates both DSBs and GT efficiency. On the other hand, application of multiple sgRNAs does not always enhance GT efficiency. Our results also suggested that some inefficient sgRNAs would play a role as a helper to facilitate other sgRNAs DSBs activity. Conclusions The results of this study clearly show that DSB efficiency, rather than mutation pattern, is one of the most important key factors determining GT efficiency. This study provides new insights into the relationship between sgRNAs, DSBs, and GTs and the molecular mechanisms of CRISPR/Cas9-mediated GTs in plants.

Non-replicative phage particles delivering CRISPR-Cas9 to target major blaCTX-M variants

Article

Full-text available

May 2024
PLOS ONE

Cluster regularly interspaced short palindromic repeats and CRISPR associated protein 9 (CRISPR-Cas9) is a promising tool for antimicrobial re-sensitization by inactivating antimicrobial resistance (AMR) genes of bacteria. Here, we programmed CRISPR-Cas9 with common spacers to target predominant blaCTX-M variants in group 1 and group 9 and their promoter in an Escherichia coli model. The CRISPR-Cas9 was delivered by non-replicative phagemid particles from a two-step process, including insertion of spacer in CRISPR and construction of phagemid vector. Spacers targeting blaCTX-M promoters and internal sequences of blaCTX-M group 1 (blaCTX-M-15 and -55) and group 9 (blaCTX-M-14, -27, -65, and -90) were cloned into pCRISPR and phagemid pRC319 for spacer evaluation and phagemid particle production. Re-sensitization and plasmid clearance were mediated by the spacers targeting internal sequences of each group, resulting in 3 log10 to 4 log10 reduction of the ratio of resistant cells, but not by those targeting the promoters. The CRISPR-Cas9 delivered by modified ΦRC319 particles were capable of re-sensitizing E. coli K-12 carrying either blaCTX-M group 1 or group 9 in a dose-dependent manner from 0.1 to 100 multiplicity of infection (MOI). In conclusion, CRISPR-Cas9 system programmed with well-designed spacers targeting multiple variants of AMR gene along with a phage-based delivery system could eliminate the widespread blaCTX-M genes for efficacy restoration of available third-generation cephalosporins by reversal of resistance in bacteria.

Tools and computational resources for the design of CRISPR/Cas9 sgRNA for NPR3 gene knockout in sour orange (Citrus aurantium L.)

Article

Full-text available

Mar 2024

Citrus fruits are the most nutritious foods widely used in flavoring, beverages, and medicines due to their outstanding curative effects. Sour orange (Citrus aurantium L.) is the predominant rootstock in most citrus growing areas due to its good agronomic attributes such as high quality, yield and tolerance to various pathogens. However, the citrus tristeza virus (CTV) is the leading epidemic agent of sour and sweet orange. This study aimed to design in silico guide RNA (sgRNA) for CRISPR/Cas9-mediated inactivation of the Nonexpression of Pathogenesis-Related genes 3 (NPR3) in sour orange (CaNPR3). The protein sequence of the CaNPR3 gene is 584 amino acid residues long. The amino acid sequence of the CaNPR3 gene was compared with the homologous sequences of other nearby vegetative species, showing a close similarity with Citrus sinensis and Citrus Clementina with 100% and 97.27%, respectively. CRISPR RGEN Tools provided 61 results for exon two of the CaNPR3 gene, filtering to 19 sequences and selecting four sgRNA sequences for genetic editing, which were: sgRNA 1 (5'-CATCAGGAAAAGACTTGAGT-3'), sgRNA 2 (5'-AGAACCTCAGACAACACACCTT-3'), sgRNA 3 (5'-CATCAGATTTGACCCTGGAT-3') and sgR-NA 4 (5'- TTCTGGAGGGAGGGAGAGAAATGAGGAGG -3'). The predicted secondary structures of the four selected sgRNAs present efficient structures for gene editing of the target gene, allowing it to recognize, interact with Cas9 protein and edit the target region. Keywords: Gene editing, guide RNA, CaNPR3, in silico.

Optical Genome Mapping Reveals Genomic Alterations upon Gene Editing in hiPSCs: Implications for Neural Tissue Differentiation and Brain Organoid Research

Article

Full-text available

Mar 2024

Genome editing, notably CRISPR (cluster regularly interspaced short palindromic repeats)/Cas9 (CRISPR-associated protein 9), has revolutionized genetic engineering allowing for precise targeted modifications. This technique’s combination with human induced pluripotent stem cells (hiPSCs) is a particularly valuable tool in cerebral organoid (CO) research. In this study, CRISPR/Cas9-generated fluorescently labeled hiPSCs exhibited no significant morphological or growth rate differences compared with unedited controls. However, genomic aberrations during gene editing necessitate efficient genome integrity assessment methods. Optical genome mapping, a high-resolution genome-wide technique, revealed genomic alterations, including chromosomal copy number gain and losses affecting numerous genes. Despite these genomic alterations, hiPSCs retain their pluripotency and capacity to generate COs without major phenotypic changes but one edited cell line showed potential neuroectodermal differentiation impairment. Thus, this study highlights optical genome mapping in assessing genome integrity in CRISPR/Cas9-edited hiPSCs emphasizing the need for comprehensive integration of genomic and morphological analysis to ensure the robustness of hiPSC-based models in cerebral organoid research.

Validation of endogenous U6 promoters for expanding the CRISPR toolbox in Nicotiana tabacum

Article

Full-text available

Mar 2024

Optimization of CRISPR/Cas9 system in Eustoma grandiflorum

Article

Full-text available

Feb 2024

The optimization of the CRISPR-Cas9 system for enhancing editing efficiency holds significant value in scientific research. In this study, we optimized single guide RNA and Cas9 promoters of the CRISPR-Cas9 vector and established an efficient protoplast isolation and transient transformation system in Eustoma grandiflorum, and we successfully applied the modified CRISPR-Cas9 system to detect editing efficiency of the EgPDS gene. The activity of the EgU6-2 promoter in E. grandiflorum protoplasts was approximately three times higher than that of the GmU6 promoter. This promoter, along with the EgUBQ10 promoter, was applied in the CRISPR-Cas9 cassette, the modified CRISPR-Cas9 vectors that pEgU6-2::sgRNA-2/pEgUBQ10::Cas9-2 editing efficiency was 37.7%, which was 30.3% higher than that of the control, and the types of mutation are base substitutions, small fragment deletions and insertions. Finally we obtained an efficient gene editing vector for E. grandiflorum. This project provides an important technical platform for the study of gene function in E. grandiflorum.

DNA shape features improve prediction of CRISPR/Cas9 activity

Article

Apr 2024
METHODS

Editing of banana, apple, and grapevine genomes using the CRISPR-Cas9 system

Chapter

Jan 2024

CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo

Article

Full-text available

Aug 2015
Br J Pharmacol

CRISPR-Cas9 technology provides a powerful system for genome engineering. However, variable activity across different single guide RNAs (sgRNAs) remains a significant limitation. We analyzed the molecular features that influence sgRNA stability, activity and loading into Cas9 in vivo. We observed that guanine enrichment and adenine depletion increased sgRNA stability and activity, whereas differential sgRNA loading, nucleosome positioning and Cas9 off-target binding were not major determinants. We also identified sgRNAs truncated by one or two nucleotides and containing 5' mismatches as efficient alternatives to canonical sgRNAs. On the basis of these results, we created a predictive sgRNA-scoring algorithm, CRISPRscan, that effectively captures the sequence features affecting the activity of CRISPR-Cas9 in vivo. Finally, we show that targeting Cas9 to the germ line using a Cas9-nanos 3' UTR led to the generation of maternal-zygotic mutants, as well as increased viability and decreased somatic mutations. These results identify determinants that influence Cas9 activity and provide a framework for the design of highly efficient sgRNAs for genome targeting in vivo.

Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach

Article

Full-text available

Jul 2015
Br J Pharmacol

We developed an in vivo library-on-library methodology to simultaneously assess single guide RNA (sgRNA) activity across ∼1,400 genomic loci. Assaying across multiple human cell types and end-processing enzymes as well as two Cas9 orthologs, we unraveled underlying nucleotide sequence and epigenetic parameters. Our results and software (http://crispr.med.harvard.edu/sgRNAScorer) enable improved design of reagents, shed light on mechanisms of genome targeting, and provide a generalizable framework to study nucleic acid-nucleic acid interactions and biochemistry in high throughput.

Sequence determinants of improved CRISPR sgRNA design

Article

Full-text available

Jun 2015

The CRISPR/CAS9 system has revolutionized mammalian somatic cell genetics. Genome-wide functional screens employing CRISPR/Cas9-mediated knockout or dCas9 fusion-mediated inhibition/activation (CRISPRi/a) are powerful techniques for discovering phenotype-associated gene function. We systematically assessed the DNA sequence features that contribute to single guide RNA (sgRNA) efficiency in CRISPR-based screens. Leveraging the information from multiple designs, we derived a new sequence model for predicting sgRNA efficiency in CRISPR/Cas9 knockout experiments. Our model confirmed known features, and suggested new features including a preference for cytosine at the cleavage site. The model was experimentally validated for sgRNA-mediated mutation rate and protein knockout efficiency. Tested on independent datasets, the model achieved significant results in both positive and negative selection conditions, and outperformed existing models. We also found that the sequence preference for CRISPRi/a is substantially different from that for CRISPR/Cas9 knockout and propose a new model for predicting sgRNA efficiency in CRISPRi/a experiments. These results facilitate the genome-wide design of improved sgRNA for both knockout and CRISPRi/a studies. Published by Cold Spring Harbor Laboratory Press.

Cas9-chromatin binding information enables more accurate CRISPR off-target prediction

Article

Full-text available

Jun 2015
NUCLEIC ACIDS RES

The CRISPR system has become a powerful biological tool with a wide range of applications. However, improving targeting specificity and accurately predicting potential off-targets remains a significant goal. Here, we introduce a web-based CR: ISPR/Cas9 O: ff-target P: rediction and I: dentification T: ool (CROP-IT) that performs improved off-target binding and cleavage site predictions. Unlike existing prediction programs that solely use DNA sequence information; CROP-IT integrates whole genome level biological information from existing Cas9 binding and cleavage data sets. Utilizing whole-genome chromatin state information from 125 human cell types further enhances its computational prediction power. Comparative analyses on experimentally validated datasets show that CROP-IT outperforms existing computational algorithms in predicting both Cas9 binding as well as cleavage sites. With a user-friendly web-interface, CROP-IT outputs scored and ranked list of potential off-targets that enables improved guide RNA design and more accurate prediction of Cas9 binding or cleavage sites. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Epigenome editing by a CRISPR-Cas9-based acetyltransferase activates genes from promoters and enhancers

Article

Full-text available

Apr 2015
NAT BIOTECHNOL

Technologies that enable targeted manipulation of epigenetic marks could be used to precisely control cell phenotype or interrogate the relationship between the epigenome and transcriptional control. Here we describe a programmable, CRISPR-Cas9-based acetyltransferase consisting of the nuclease-null dCas9 protein fused to the catalytic core of the human acetyltransferase p300. The fusion protein catalyzes acetylation of histone H3 lysine 27 at its target sites, leading to robust transcriptional activation of target genes from promoters and both proximal and distal enhancers. Gene activation by the targeted acetyltransferase was highly specific across the genome. In contrast to previous dCas9-based activators, the acetyltransferase activates genes from enhancer regions and with an individual guide RNA. We also show that the core p300 domain can be fused to other programmable DNA-binding proteins. These results support targeted acetylation as a causal mechanism of transactivation and provide a robust tool for manipulating gene regulation.

R: A Language and Environment for Statistical Computing

Book

Jan 2015

Core R Team

R: A Language and Environment for Statistical Computing

Book

Jan 2015

Core R Team

Evolution and Classification of CRISPR-Cas Systems and Cas Protein Families

Chapter

Jan 2013

The CRISPR-Cas modules are adaptive antivirus immunity systems that are present in most archaea and many bacteria. These systems function by incorporating fragments of alien genomes into specific genomic loci, transcribing the inserts and using the transcripts as guide RNAs to destroy the genome of the cognate virus or plasmid. This RNA interference-like immune response is mediated by numerous, highly diverse Cas (CRISPR-associated) proteins, several of which form the Cascade complex involved in the processing of CRISPR loci transcripts and cleavage of the target DNA. Comparative analysis of the CRISPR-Cas modules led to the classification of the CRISPR-Cas systems into three types (I, II and III) that are characterized by distinct sets of cas genes. Classification of Cas proteins into families and superfamilies is a non-trivial task because of the fast evolution of many cas genes. Exhaustive sequence comparison aided by analysis of the available crystal structures led to the delineation of approximately 30 protein families that can be further classified into several superfamilies. By far the most common domain in Cas proteins is the RNA Recognition Motif (RRM). The RRM domains show remarkable diversity within the CRISPR-Cas systems and in particular comprise the scaffold of the Cascade complex. In addition to the numerous RRM domains, including a distinct polymerase-cyclase domain, the Cas proteins contain a distinct Superfamily II helicase domain, and several diverse nuclease domains. Detailed comparative analysis of the sequences and structures of Cas proteins structures shed light on the deep relationships between Type I and Type III systems and allowed us to propose a simple evolutionary scenario for the origin of CRISPR-Cas system. Moreover, combination of experimental structural studies and comparative analysis provides for detailed models of the structures of the Cascade complexes from different CRISPR-Cas types revealing remarkable architectural uniformity.

Expanding the biologist's toolkit with CRISPR-Cas9

Article

May 2015

Few discoveries transform a discipline overnight, but biologists today can manipulate cells in ways never possible before, thanks to a peculiar form of prokaryotic adaptive immunity mediated by clustered regularly interspaced short palindromic repeats (CRISPR). From elegant studies that deciphered how these immune systems function in bacteria, researchers quickly uncovered the technological potential of Cas9, an RNA-guided DNA cleaving enzyme, for genome engineering. Here we highlight the recent explosion in visionary applications of CRISPR-Cas9 that promises to usher in a new era of biological understanding and control. Copyright © 2015 Elsevier Inc. All rights reserved.

High-throughput functional genomics using CRISPR-Cas9

Article

Apr 2015
NAT REV GENET

Forward genetic screens are powerful tools for the discovery and functional annotation of genetic elements. Recently, the RNA-guided CRISPR (clustered regularly interspaced short palindromic repeat)-associated Cas9 nuclease has been combined with genome-scale guide RNA libraries for unbiased, phenotypic screening. In this Review, we describe recent advances using Cas9 for genome-scale screens, including knockout approaches that inactivate genomic loci and strategies that modulate transcriptional activity. We discuss practical aspects of screen design, provide comparisons with RNA interference (RNAi) screening, and outline future applications and challenges.

Sequence features associated with the cleavage efficiency of CRISPR/Cas9 system

Abstract and Figures

Supplementary resource (1)

Recommended publications

Optimization Strategies for the CRISPR-Cas9 Genome-Editing System

CRISPRpred: A flexible and efficient tool for sgRNAs on-target activity prediction in CRISPR/Cas9 sy...

Refined sgRNA efficacy prediction improves large- and small-scale CRISPR-Cas9 applications

Prediction of CRISPR sgRNA activity using a deep convolutional neural network

CRISPR-Cas9 cleavage efficiency correlates strongly with targetsgRNA folding stability: From physica...

CRISPR/Cas9 Guide RNA Design Rules for Predicting Activity