A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes. PLoS Comput Biol 1:e60

The Institute for Genomic Research, Rockville, Maryland, USA.
PLoS Computational Biology (Impact Factor: 4.62). 12/2005; 1(6):e60. DOI: 10.1371/journal.pcbi.0010060
Source: PubMed


Clustered regularly interspaced short palindromic repeats (CRISPRs) are a family of DNA direct repeats found in many prokaryotic genomes. Repeats of 21-37 bp typically show weak dyad symmetry and are separated by regularly sized, nonrepetitive spacer sequences. Four CRISPR-associated (Cas) protein families, designated Cas1 to Cas4, are strictly associated with CRISPR elements and always occur near a repeat cluster. Some spacers originate from mobile genetic elements and are thought to confer "immunity" against the elements that harbor these sequences. In the present study, we have systematically investigated uncharacterized proteins encoded in the vicinity of these CRISPRs and found many additional protein families that are strictly associated with CRISPR loci across multiple prokaryotic species. Multiple sequence alignments and hidden Markov models have been built for 45 Cas protein families. These models identify family members with high sensitivity and selectivity and classify key regulators of development, DevR and DevS, in Myxococcus xanthus as Cas proteins. These identifications show that CRISPR/cas gene regions can be quite large, with up to 20 different, tandem-arranged cas genes next to a repeat cluster or filling the region between two repeat clusters. Distinctive subsets of the collection of Cas proteins recur in phylogenetically distant species and correlate with characteristic repeat periodicity. The analyses presented here support initial proposals of mobility of these units, along with the likelihood that loci of different subtypes interact with one another as well as with host cell defensive, replicative, and regulatory systems. It is evident from this analysis that CRISPR/cas loci are larger, more complex, and more heterogeneous than previously appreciated.

Download full-text


Available from: Emmanuel F Mongodin,
  • Source
    • "Two partially independent subsystems of Cas proteins can be distinguished (Makarova et al., 2011b; Richter et al., 2012). The first group is found across multiple types or subtypes, consists of an information processing module and requires the universally present core proteins, Cas1 and Cas2, which are involved in new spacer acquisition (Makarova et al., 2011b; Pougach et al., 2010; Makarova et al., 2006; Haft et al., 2005). The second, or executive, subsystem is required for processing of primary CRISPR transcripts (crRNA) and recognition and degradation of invading foreign nucleic acid, and is quite diverse. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Approximately all sequenced archaeal and half of eubacterial genomes have some sort of adaptive immune system, which enables them to target and cleave invading foreign genetic elements by an RNAi-like pathway. CRISPR–Cas (clustered regularly interspaced short palindromic repeats–CRISPR-associated proteins) systems consist of the CRISPR loci with multiple copies of a short repeat sequence separated by variable sequences with similar size that are derived from invaders and cas genes encode proteins involved in RNA binding, endo-and exo-nucleases, helicases, and polymerases activities. There are three main types (I, II and III) of CRISPR/Cas systems. All systems function in three distinct stages: (1) adaptation, (2) crRNA biogenesis, and (3) interference. This review focuses on the features and mechanisms of the CRISPR-Cas systems and current finding about them.
  • Source
    • "The arsenal of prokaryotic defense mechanisms against mobile genetics elements (MGE), such as bacteriophages and (conjugative ) plasmids, includes adaptive immunity that serves as a sequence-specific memory of prior infections (Barrangou and Marraffini, 2014; Gasiunas et al., 2014; Reeks et al., 2013; Terns and Terns, 2014; van der Oost et al., 2014). These systems are made up of arrays of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR-associated (cas) genes that are present in approximately half of sequenced bacteria and most archaea (Grissa et al., 2007; Haft et al., 2005; Makarova et al., 2011). CRISPR-Cas systems are categorized into three major types (Types I, II, and III) on the basis of their specific Cas proteins (Koonin and Makarova, 2013; Makarova et al., 2011). "
    [Show abstract] [Hide abstract]
    ABSTRACT: CRISPR-Cas is a prokaryotic adaptive immune system that provides sequence-specific defense against foreign nucleic acids. Here we report the structure and function of the effector complex of the Type III-A CRISPR-Cas system of Thermus thermophilus: the Csm complex (TtCsm). TtCsm is composed of five different protein subunits (Csm1–Csm5) with an uneven stoichiometry and a single crRNA of variable size (35–53 nt). The TtCsm crRNA content is similar to the Type III-B Cmr complex, indicating that crRNAs are shared among different subtypes. A negative stain EM structure of the TtCsm complex exhibits the characteristic architecture of Type I and Type III CRISPR-associated ribonucleoprotein complexes. crRNA-protein crosslinking studies show extensive contacts between the Csm3 backbone and the bound crRNA. We show that, like TtCmr, TtCsm cleaves complementary target RNAs at multiple sites. Unlike Type I complexes, interference by TtCsm does not proceed via initial base pairing by a seed sequence.
    Molecular Cell 11/2014; 56(4). DOI:10.1016/j.molcel.2014.10.005 · 14.02 Impact Factor
  • Source
    • "We extracted all genes from all available bacterial genomes. We then searched for all cas genes using a recent version of TIGRFAM models from Haft et al. (2005, 2013) in combination with HMMER (Eddy, 2011). A cas gene was annotated when one of its respective models was found with an E-value "
    [Show abstract] [Hide abstract]
    ABSTRACT: Motivation: The discovery of CRISPR-Cas systems almost 20 years ago rapidly changed our perception of the bacterial and archaeal immune systems. CRISPR loci consist of several repetitive DNA sequences called repeats, inter-spaced by stretches of variable length sequences called spacers. This CRISPR array is transcribed and processed into multiple mature RNA species (crRNAs). A single crRNA is integrated into an interference complex, together with CRISPR-associated (Cas) proteins, to bind and degrade invading nucleic acids. Although existing bioinformatics tools can recognize CRISPR loci by their characteristic repeat-spacer architecture, they generally output CRISPR arrays of ambiguous orientation and thus do not determine the strand from which crRNAs are processed. Knowledge of the correct orientation is crucial for many tasks, including the classification of CRISPR conservation, the detection of leader regions, the identification of target sites (protospacers) on invading genetic elements and the characterization of protospacer-adjacent motifs. Results: We present a fast and accurate tool to determine the crRNA-encoding strand at CRISPR loci by predicting the correct orientation of repeats based on an advanced machine learning approach. Both the repeat sequence and mutation information were encoded and processed by an efficient graph kernel to learn higher-order correlations. The model was trained and tested on curated data comprising >4500 CRISPRs and yielded a remarkable performance of 0.95 AUC ROC (area under the curve of the receiver operator characteristic). In addition, we show that accurate orientation information greatly improved detection of conserved repeat sequence families and structure motifs. We integrated CRISPRstrand predictions into our CRISPRmap web server of CRISPR conservation and updated the latter to version 2.0. Availability: CRISPRmap and CRISPRstrand are available at http://rna.informatik.uni-freiburg.de/CRISPRmap. Supplementary information: Supplementary data are available at Bioinformatics online.
    Bioinformatics 09/2014; 30(17):i489-i496. DOI:10.1093/bioinformatics/btu459 · 4.98 Impact Factor
Show more