Kunin, V., Sorek, R. & Hugenholtz, P. Evolutionary conservation of sequence and secondary structures in CRISPR repeats. Genome Biol. 8, R61

DOE Joint Genome Institute, Walnut Creek, CA 94598, USA.
Genome biology (Impact Factor: 10.81). 02/2007; 8(4):R61. DOI: 10.1186/gb-2007-8-4-r61
Source: PubMed


Clustered regularly interspaced short palindromic repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in approximately 40% of bacterial and most archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CASs), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been recently shown that CRISPR provides acquired resistance against viruses in prokaryotes.
Here we analyze CRISPR repeats identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. Some of the clusters present stable, highly conserved RNA secondary structures, while others lack detectable structures. Stable secondary structures exhibit multiple compensatory base changes in the stem region, indicating evolutionary and functional conservation.
We show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification, including specific relationships between CRISPR and CAS subtypes.

Download full-text


Available from: Victor Kunin, Oct 09, 2015
42 Reads
  • Source
    • "The spacer sequences generally originate from phage or plasmid DNA (Gasiunas et al., 2014) and they represent a ―memory of past genetic aggressions‖ (Stern et al., 2010). The repeat sequences within a CRISPR locus are conserved, but in different CRISPR loci can vary in both sequence and length although there are partially conserved sequences such as a GTTTg/c motif at the 5énd and a GAAAC motif at the 3énd (Bhaya et al., 2011; Kunin et al., 2007). In addition, the number of repeat–spacer units in a CRISPR locus varies widely among organisms (Wiedenheft et al., 2012). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Approximately all sequenced archaeal and half of eubacterial genomes have some sort of adaptive immune system, which enables them to target and cleave invading foreign genetic elements by an RNAi-like pathway. CRISPR–Cas (clustered regularly interspaced short palindromic repeats–CRISPR-associated proteins) systems consist of the CRISPR loci with multiple copies of a short repeat sequence separated by variable sequences with similar size that are derived from invaders and cas genes encode proteins involved in RNA binding, endo-and exo-nucleases, helicases, and polymerases activities. There are three main types (I, II and III) of CRISPR/Cas systems. All systems function in three distinct stages: (1) adaptation, (2) crRNA biogenesis, and (3) interference. This review focuses on the features and mechanisms of the CRISPR-Cas systems and current finding about them.
  • Source
    • "This later domain forms most of the recognition interactions with the RNA (Fig. 1A). The RNA adopts a stem-loop structure [20] with five base pairs in A-form helical stem capped by GUAUA pentaloop containing a sheared G11- A15 base pair and a bulged nucleotide U14. The pentaloop structurally belongs to family of GNRA tetraloops with insertion, i.e., GNR(N)A family (see Fig. 1B) [21]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Many prokaryotic genomes comprise Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) offering defense against foreign nucleic acids. These immune systems are conditioned by the production of small CRISPR-derived RNAs matured from long RNA precursors. This often requires a Csy4 endoribonuclease cleaving the RNA 3'-end. We report extended explicit solvent molecular dynamic (MD) simulations of Csy4/RNA complex in precursor and product states, based on X-ray structures of product and inactivated precursor (55 simulations; ~3.7μs in total). The simulations identify double-protonated His29 and deprotonated terminal phosphate as the likely dominant protonation states consistent with the product structure. We revealed potential substates consistent with Ser148 and His29 acting as the general base and acid, respectively. The Ser148 could be straightforwardly deprotonated through solvent and could without further structural rearrangements deprotonate the nucleophile, contrasting similar studies investigating the general base role of nucleobases in ribozymes. We could not locate geometries consistent with His29 acting as general base. However, we caution that the X-ray structures do not always capture the catalytically active geometries and then the reactive structures may be unreachable by the simulation technique. We identified potential catalytic arrangement of the Csy4/RNA complex but we also report limitations of the simulation technique. Even for the dominant protonation state we could not achieve full agreement between the simulations and the structural data. Potential catalytic arrangement of the Csy4/RNA complex is found. Further, we provide unique insights into limitations of simulations of protein/RNA complexes, namely, the influence of the starting experimental structures and force field limitations. This article is part of a Special Issue entitled Recent developments of molecular dynamics. Copyright © 2013 Elsevier B.V. All rights reserved.
    Biochimica et Biophysica Acta (BBA) - General Subjects 10/2014; 1850(5). DOI:10.1016/j.bbagen.2014.10.021 · 4.38 Impact Factor
  • Source
    • "One of the main signals that we used to define the number and size of the blocks is the mutation rate, defined as the fraction of mutations per nucleotide in each block. In Supplementary Figure S3, we report the mutation rate for the CRISPR locus partitioning with k = 8 and P = 2 on a dataset of 897 CRISPR arrays (Kunin et al., 2007; Shah and Garrett, 2011): each repeat is split into five adjacent regions, with terminal blocks spanning exactly 4 nucleotides and a central block spanning 12 nucleotides on average. In these settings, we observed a highly significant 4-fold and a 16-fold increase in the mutation rate in the initial 8 nucleotides and in the terminal block, respectively, as compared to the middle block. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: The discovery of CRISPR-Cas systems almost 20 years ago rapidly changed our perception of the bacterial and archaeal immune systems. CRISPR loci consist of several repetitive DNA sequences called repeats, inter-spaced by stretches of variable length sequences called spacers. This CRISPR array is transcribed and processed into multiple mature RNA species (crRNAs). A single crRNA is integrated into an interference complex, together with CRISPR-associated (Cas) proteins, to bind and degrade invading nucleic acids. Although existing bioinformatics tools can recognize CRISPR loci by their characteristic repeat-spacer architecture, they generally output CRISPR arrays of ambiguous orientation and thus do not determine the strand from which crRNAs are processed. Knowledge of the correct orientation is crucial for many tasks, including the classification of CRISPR conservation, the detection of leader regions, the identification of target sites (protospacers) on invading genetic elements and the characterization of protospacer-adjacent motifs. Results: We present a fast and accurate tool to determine the crRNA-encoding strand at CRISPR loci by predicting the correct orientation of repeats based on an advanced machine learning approach. Both the repeat sequence and mutation information were encoded and processed by an efficient graph kernel to learn higher-order correlations. The model was trained and tested on curated data comprising >4500 CRISPRs and yielded a remarkable performance of 0.95 AUC ROC (area under the curve of the receiver operator characteristic). In addition, we show that accurate orientation information greatly improved detection of conserved repeat sequence families and structure motifs. We integrated CRISPRstrand predictions into our CRISPRmap web server of CRISPR conservation and updated the latter to version 2.0. Availability: CRISPRmap and CRISPRstrand are available at Supplementary information: Supplementary data are available at Bioinformatics online.
    Bioinformatics 09/2014; 30(17):i489-i496. DOI:10.1093/bioinformatics/btu459 · 4.98 Impact Factor
Show more