Evolutionary conservation of sequence and secondary structures in CRISPR repeats.

DOE Joint Genome Institute, Walnut Creek, CA 94598, USA.
Genome biology (Impact Factor: 10.47). 02/2007; 8(4):R61. DOI: 10.1186/gb-2007-8-4-r61
Source: PubMed

ABSTRACT Clustered regularly interspaced short palindromic repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in approximately 40% of bacterial and most archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CASs), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been recently shown that CRISPR provides acquired resistance against viruses in prokaryotes.
Here we analyze CRISPR repeats identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. Some of the clusters present stable, highly conserved RNA secondary structures, while others lack detectable structures. Stable secondary structures exhibit multiple compensatory base changes in the stem region, indicating evolutionary and functional conservation.
We show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification, including specific relationships between CRISPR and CAS subtypes.

Download full-text


Available from: Victor Kunin, Jul 03, 2015
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Approximately all sequenced archaeal and half of eubacterial genomes have some sort of adaptive immune system, which enables them to target and cleave invading foreign genetic elements by an RNAi-like pathway. CRISPR–Cas (clustered regularly interspaced short palindromic repeats–CRISPR-associated proteins) systems consist of the CRISPR loci with multiple copies of a short repeat sequence separated by variable sequences with similar size that are derived from invaders and cas genes encode proteins involved in RNA binding, endo-and exo-nucleases, helicases, and polymerases activities. There are three main types (I, II and III) of CRISPR/Cas systems. All systems function in three distinct stages: (1) adaptation, (2) crRNA biogenesis, and (3) interference. This review focuses on the features and mechanisms of the CRISPR-Cas systems and current finding about them.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Many prokaryotic genomes comprise Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) offering defense against foreign nucleic acids. These immune systems are conditioned by the production of small CRISPR-derived RNAs matured from long RNA precursors. This often requires a Csy4 endoribonuclease cleaving the RNA 3'-end. We report extended explicit solvent molecular dynamic (MD) simulations of Csy4/RNA complex in precursor and product states, based on X-ray structures of product and inactivated precursor (55 simulations; ~3.7μs in total). The simulations identify double-protonated His29 and deprotonated terminal phosphate as the likely dominant protonation states consistent with the product structure. We revealed potential substates consistent with Ser148 and His29 acting as the general base and acid, respectively. The Ser148 could be straightforwardly deprotonated through solvent and could without further structural rearrangements deprotonate the nucleophile, contrasting similar studies investigating the general base role of nucleobases in ribozymes. We could not locate geometries consistent with His29 acting as general base. However, we caution that the X-ray structures do not always capture the catalytically active geometries and then the reactive structures may be unreachable by the simulation technique. We identified potential catalytic arrangement of the Csy4/RNA complex but we also report limitations of the simulation technique. Even for the dominant protonation state we could not achieve full agreement between the simulations and the structural data. Potential catalytic arrangement of the Csy4/RNA complex is found. Further, we provide unique insights into limitations of simulations of protein/RNA complexes, namely, the influence of the starting experimental structures and force field limitations. This article is part of a Special Issue entitled Recent developments of molecular dynamics. Copyright © 2013 Elsevier B.V. All rights reserved.
    Biochimica et Biophysica Acta (BBA) - General Subjects 10/2014; 1850(5). DOI:10.1016/j.bbagen.2014.10.021 · 3.83 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: The discovery of CRISPR-Cas systems almost 20 years ago rapidly changed our perception of the bacterial and archaeal immune systems. CRISPR loci consist of several repetitive DNA sequences called repeats, inter-spaced by stretches of variable length sequences called spacers. This CRISPR array is transcribed and processed into multiple mature RNA species (crRNAs). A single crRNA is integrated into an interference complex, together with CRISPR-associated (Cas) proteins, to bind and degrade invading nucleic acids. Although existing bioinformatics tools can recognize CRISPR loci by their characteristic repeat-spacer architecture, they generally output CRISPR arrays of ambiguous orientation and thus do not determine the strand from which crRNAs are processed. Knowledge of the correct orientation is crucial for many tasks, including the classification of CRISPR conservation, the detection of leader regions, the identification of target sites (protospacers) on invading genetic elements and the characterization of protospacer-adjacent motifs.Results: We present a fast and accurate tool to determine the crRNA-encoding strand at CRISPR loci by predicting the correct orientation of repeats based on an advanced machine learning approach. Both the repeat sequence and mutation information were encoded and processed by an efficient graph kernel to learn higher-order correlations. The model was trained and tested on curated data comprising >4500 CRISPRs and yielded a remarkable performance of 0.95 AUC ROC (area under the curve of the receiver operator characteristic). In addition, we show that accurate orientation information greatly improved detection of conserved repeat sequence families and structure motifs. We integrated CRISPRstrand predictions into our CRISPRmap web server of CRISPR conservation and updated the latter to version 2.0.Availability: CRISPRmap and CRISPRstrand are available at backofen@informatik.uni-freiburg.deSupplementary information: Supplementary data are available at Bioinformatics online.
    Bioinformatics 09/2014; 30(17):i489-i496. DOI:10.1093/bioinformatics/btu459 · 4.62 Impact Factor