[Show abstract][Hide abstract] ABSTRACT: Casposons are a superfamily of putative self-synthesizing transposable elements that are predicted to employ a homolog of
Cas1 protein as a recombinase and could have contributed to the origin of the CRISPR-Cas adaptive immunity systems in archaea
and bacteria. Casposons remain uncharacterized experimentally, except for the recent demonstration of the integrase activity
of the Cas1 homolog, and given their relative rarity in archaea and bacteria, original comparative genomic analysis has not
provided direct indications of their mobility. Here we report evidence of casposon mobility obtained by comparison of the
genomes of 62 strains of the archaeon Methanosarcina mazei. In these genomes, casposons are variably inserted in three distinct sites indicative of multiple, recent gains and losses.
Some casposons are inserted into other mobile genetic elements that might provide vehicles for horizontal transfer of the
casposons. Additionally, many M. mazei genomes contain previously undetected solo terminal inverted repeats that apparently are derived from casposons and could
resemble intermediates in CRISPR evolution. We further demonstrate the sequence specificity of casposon insertion and note
clear parallels with the adaptation mechanism of CRISPR-Cas. Finally, besides identifying additional representatives in each
of the three originally defined families we describe a new, fourth, family of casposons.
Full-text · Article · Jan 2016 · Genome Biology and Evolution
[Show abstract][Hide abstract] ABSTRACT: Bacterial genomes encode numerous homologs of Cas9, the effector protein of the type II CRISPR-Cas systems. The homology region
includes the arginine-rich helix and the HNH nuclease domain that is inserted into the RuvC-like nuclease domain. These genes,
however, are not linked to cas genes or CRISPR. Here we show that Cas9 homologs represent a distinct group of non-autonomous transposons which we denote
ISC (Insertion Sequences Cas9-like). We identify many diverse families of full-length ISC transposons and demonstrate that their
terminal sequences (particularly 3’ -termini) are similar to those of IS605 superfamily transposons that are mobilized by the Y1 tyrosine transposase encoded by the TnpA gene and often also encode
the TnpB protein containing the RuvC-like endonuclease domain. The terminal regions of the ISC and IS605 transposons contain palindromic structures that are likely recognized by the Y1 transposase. The transposons from these two
groups are inserted either exactly in the middle or upstream of specific 4-bp target sites, without target site duplication.
We also identify autonomous ISC transposons that encode TnpA-like Y1 transposases. Thus, the non-autonomous ISC transposons could be mobilized in trans either by Y1 transposases of other, autonomous ISC transposons or by Y1 transposases of the more abundant IS605 transposons. These findings imply an evolutionary scenario in which the ISC transposons evolved from IS605 family transposons, possibly via insertion of a mobile Group II intron encoding the HNH domain, and Cas9 subsequently evolved
via immobilization of an ISC transposon.
IMPORTANCE Cas9 endonucleases, the effectors of type II CRISPR-Cas systems, represent the new generation of genome engineering tools.
Here we describe in detail a novel family of transposable elements that encode the likely ancestors of Cas9 and outline the
evolutionary scenario connecting different varieties of these transposons and Cas9.
Preview · Article · Dec 2015 · Journal of Bacteriology
[Show abstract][Hide abstract] ABSTRACT: Only a small fraction of bacteria and archaea that are identifiable by metagenomics can be grown on standard media. Recent efforts on deep metagenomics sequencing, single-cell genomics and the use of specialized culture conditions (culturomics) increasingly yield novel microbes some of which represent previously uncharacterized phyla and possess unusual biological traits.
We report isolation and genome analysis of Babela massiliensis, an obligate intracellular parasite of Acanthamoeba castellanii. B. massiliensis shows an unusual, fission mode of cell multiplication whereby large, polymorphic bodies accumulate in the cytoplasm of infected amoeba and then split into mature bacterial cells. This unique mechanism of cell division is associated with a deep degradation of the cell division machinery and delayed expression of the ftsZ gene. The genome of B. massiliensis consists of a circular chromosome approximately 1.12 megabase in size that encodes, 981 predicted proteins, 38 tRNAs and one typical rRNA operon. Phylogenetic analysis shows that B. massiliensis belongs to the putative bacterial phylum TM6 that so far was represented by the draft genome of the JCVI TM6SC1 bacterium obtained by single cell genomics and numerous environmental sequences.
Currently, B. massiliensis is the only cultivated member of the putative TM6 phylum. Phylogenomic analysis shows diverse taxonomic affinities for B. massiliensis genes, suggestive of multiple gene acquisitions via horizontal transfer from other bacteria and eukaryotes. Horizontal gene transfer is likely to be facilitated by the cohabitation of diverse parasites and symbionts inside amoeba. B. massiliensis encompasses many genes encoding proteins implicated in parasite-host interaction including the greatest number of ankyrin repeats among sequenced bacteria and diverse proteins related to the ubiquitin system. Characterization of B. massiliensis, a representative of a distinct bacterial phylum, thanks to its ability to grow in amoeba, reaffirms the critical role of diverse culture approaches in microbiology.
This article was reviewed by Dr. Igor Zhulin, Dr. Jeremy Selengut, and Pr Martijn Huynen.
[Show abstract][Hide abstract] ABSTRACT: The infection of Pseudomonas aeruginosa by the giant bacteriophage phiKZ is resistant to host RNA polymerase (RNAP) inhibitor rifampicin. phiKZ encodes two sets
of polypeptides that are distantly related to fragments of the two largest subunits of cellular multisubunit RNAPs. Polypeptides
of one set are encoded by middle phage genes and are found in the phiKZ virions. Polypeptides of the second set are encoded
by early phage genes and are absent from virions. Here, we report isolation of a five-subunit RNAP from phiKZ-infected cells.
Four subunits of this enzyme are cellular RNAP subunits homologs of the non-virion set; the fifth subunit is a protein of
unknown function. In vitro, this complex initiates transcription from late phiKZ promoters in rifampicin-resistant manner. Thus, this enzyme is a non-virion
phiKZ RNAP responsible for transcription of late phage genes. The phiKZ RNAP lacks identifiable assembly and promoter specificity
subunits/factors characteristic for eukaryal, archaeal and bacterial RNAPs and thus provides a unique model for comparative
analysis of the mechanism, regulation and evolution of this important class of enzymes.
Full-text · Article · Oct 2015 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: Microbial CRISPR-Cas systems are divided into Class 1, with multisubunit effector complexes, and Class 2, with single protein effectors. Currently, only two Class 2 effectors, Cas9 and Cpf1, are known. We describe here three distinct Class 2 CRISPR-Cas systems. The effectors of two of the identified systems, C2c1 and C2c3, contain RuvC-like endonuclease domains distantly related to Cpf1. The third system, C2c2, contains an effector with two predicted HEPN RNase domains. Whereas production of mature CRISPR RNA (crRNA) by C2c1 depends on tracrRNA, C2c2 crRNA maturation is tracrRNA independent. We found that C2c1 systems can mediate DNA interference in a 5'-PAM-dependent fashion analogous to Cpf1. However, unlike Cpf1, which is a single-RNA-guided nuclease, C2c1 depends on both crRNA and tracrRNA for DNA cleavage. Finally, comparative analysis indicates that Class 2 CRISPR-Cas systems evolved on multiple occasions through recombination of Class 1 adaptation modules with effector proteins acquired from distinct mobile elements.
[Show abstract][Hide abstract] ABSTRACT: The microbial adaptive immune system CRISPR mediates defense against foreign genetic elements through two classes of RNA-guided nuclease effectors. Class 1 effectors utilize multi-protein complexes, whereas class 2 effectors rely on single-component effector proteins such as the well-characterized Cas9. Here, we report characterization of Cpf1, a putative class 2 CRISPR effector. We demonstrate that Cpf1 mediates robust DNA interference with features distinct from Cas9. Cpf1 is a single RNA-guided endonuclease lacking tracrRNA, and it utilizes a T-rich protospacer-adjacent motif. Moreover, Cpf1 cleaves DNA via a staggered DNA double-stranded break. Out of 16 Cpf1-family proteins, we identified two candidate enzymes from Acidominococcus and Lachnospiraceae, with efficient genome-editing activity in human cells. Identifying this mechanism of interference broadens our understanding of CRISPR-Cas systems and advances their genome editing applications.
[Show abstract][Hide abstract] ABSTRACT: The evolution of CRISPR-cas loci, which encode adaptive immune systems in archaea and bacteria, involves rapid changes, in particular numerous rearrangements of the locus architecture and horizontal transfer of complete loci or individual modules. These dynamics complicate straightforward phylogenetic classification, but here we present an approach combining the analysis of signature protein families and features of the architecture of cas loci that unambiguously partitions most CRISPR-cas loci into distinct classes, types and subtypes. The new classification retains the overall structure of the previous version but is expanded to now encompass two classes, five types and 16 subtypes. The relative stability of the classification suggests that the most prevalent variants of CRISPR-Cas systems are already known. However, the existence of rare, currently unclassifiable variants implies that additional types and subtypes remain to be characterized.
[Show abstract][Hide abstract] ABSTRACT: Proline plays a crucial role in cell growth and stress responses, and its accumulation is essential for the tolerance of adverse environmental conditions in plants. Two routes are used to biosynthesize proline in plants. The main route uses glutamate as a precursor, while in the other route proline is derived from ornithine. The terminal step of both pathways, the conversion of 1 δ-pyrroline-5-carboxylate (P5C) to L-proline, is catalyzed by P5C reductase (P5CR) using NADH or NADPH as a cofactor. Since P5CRs are important housekeeping enzymes, they are conserved across all domains of life and appear to be relatively unaffected throughout evolution. However, global analysis of these enzymes unveiled significant functional diversity in the preference for cofactors (NADPH vs. NADH), variation in metal dependence and the differences in the oligomeric state. In our study we investigated evolutionary patterns through phylogenetic and structural analysis of P5CR representatives from all kingdoms of life, with emphasis on the plant species. We also attempted to correlate local sequence/structure variation among the functionally and structurally characterized members of the family.
[Show abstract][Hide abstract] ABSTRACT: Archaea encode a eukaryotic-type primase comprising a catalytic subunit (PriS) and a noncatalytic subunit (PriL). Here we report the identification of a primase noncatalytic subunit, denoted PriX, from the hyperthermophilic archaeon Sulfolobus solfataricus. Like PriL, PriX is essential for the survival of the organism. The crystallographic analysis complemented by sensitive sequence comparisons shows that PriX is a diverged homologue of the C-terminal domain of PriL but lacks the iron-sulfur cluster. Phylogenomic analysis provides clues on the origin and evolution of PriX. PriX, PriL and PriS form a stable heterotrimer (PriSLX). Both PriSX and PriSLX show far greater affinity for nucleotide substrates and are substantially more active in primer synthesis than the PriSL heterodimer. In addition, PriL, but not PriX, facilitates primer extension by PriS. We propose that the catalytic activity of PriS is modulated through concerted interactions with the two noncatalytic subunits in primer synthesis.
No preview · Article · Jun 2015 · Nature Communications
[Show abstract][Hide abstract] ABSTRACT: The clustered regularly interspaced short palindromic repeats (CRISPR)-Cas (CRISPR-associated proteins) is a prokaryotic adaptive immune system that is represented in most archaea and many bacteria. Among the currently known prokaryotic defense systems, the CRISPR-Cas genomic loci show unprecedented complexity and diversity. Classification of CRISPR-Cas variants that would capture their evolutionary relationships to the maximum possible extent is essential for comparative genomic and functional characterization of this theoretically and practically important system of adaptive immunity. To this end, a multipronged approach has been developed that combines phylogenetic analysis of the conserved Cas proteins with comparison of gene repertoires and arrangements in CRISPR-Cas loci. This approach led to the current classification of CRISPR-Cas systems into three distinct types and ten subtypes for each of which signature genes have been identified. Comparative genomic analysis of the CRISPR-Cas systems in new archaeal and bacterial genomes performed over the 3 years elapsed since the development of this classification makes it clear that new types and subtypes of CRISPR-Cas need to be introduced. Moreover, this classification system captures only part of the complexity of CRISPR-Cas organization and evolution, due to the intrinsic modularity and evolutionary mobility of these immunity systems, resulting in numerous recombinant variants. Moreover, most of the cas genes evolve rapidly, complicating the family assignment for many Cas proteins and the use of family profiles for the recognition of CRISPR-Cas subtype signatures. Further progress in the comparative analysis of CRISPR-Cas systems requires integration of the most sensitive sequence comparison tools, protein structure comparison, and refined approaches for comparison of gene neighborhoods.
No preview · Article · May 2015 · Methods in molecular biology (Clifton, N.J.)
[Show abstract][Hide abstract] ABSTRACT: The RNA-guided endonuclease Cas9 has emerged as a versatile genome-editing platform. However, the size of the commonly used Cas9 from Streptococcus pyogenes (SpCas9) limits its utility for basic research and therapeutic applications that use the highly versatile adeno-associated virus (AAV) delivery vehicle. Here, we characterize six smaller Cas9 orthologues and show that Cas9 from Staphylococcus aureus (SaCas9) can edit the genome with efficiencies similar to those of SpCas9, while being more than 1 kilobase shorter. We packaged SaCas9 and its single guide RNA expression cassette into a single AAV vector and targeted the cholesterol regulatory gene Pcsk9 in the mouse liver. Within one week of injection, we observed >40% gene modification, accompanied by significant reductions in serum Pcsk9 and total cholesterol levels. We further assess the genome-wide targeting specificity of SaCas9 and SpCas9 using BLESS, and demonstrate that SaCas9-mediated in vivo genome editing has the potential to be efficient and specific.
[Show abstract][Hide abstract] ABSTRACT: With the continuously accelerating genome sequencing from diverse groups of archaea and bacteria, accurate identification of gene orthology and availability of readily expandable clusters of orthologous genes are essential for the functional annotation of new genomes. We report an update of the collection of archaeal Clusters of Orthologous Genes (arCOGs) to cover, on average, 91% of the protein-coding genes in 168 archaeal genomes. The new arCOGs were constructed using refined algorithms for orthology identification combined with extensive manual curation, including incorporation of the results of several completed and ongoing research projects in archaeal genomics. A new level of classification is introduced, superclusters that untie two or more arCOGs and more completely reflect gene family evolution than individual, disconnected arCOGs. Assessment of the current archaeal genome annotation in public databases indicates that consistent use of arCOGs can significantly improve the annotation quality. In addition to their utility for genome annotation, arCOGs also are a platform for phylogenomic analysis. We explore this aspect of arCOGs by performing a phylogenomic study of the Thermococci that are traditionally viewed as the basal branch of the Euryarchaeota. The results of phylogenomic analysis that involved both comparison of multiple phylogenetic trees and a search for putative derived shared characters by using phyletic patterns extracted from the arCOGs reveal a likely evolutionary relationship between the Thermococci, Methanococci, and Methanobacteria. The arCOGs are expected to be instrumental for a comprehensive phylogenomic study of the archaea.
[Show abstract][Hide abstract] ABSTRACT: A systematic comparative genomic analysis of all archaeal membrane proteins that have been projected to the last archaeal common ancestor gene set led to the identification of several novel components of predicted secretion, membrane remodeling, and protein glycosylation systems. Among other findings, most crenarchaea have been shown to encode highly diverged orthologs of the membrane insertase YidC, which is nearly universal in bacteria, eukaryotes, and euryarchaea. We also identified a vast family of archaeal proteins, including the C-terminal domain of N-glycosylation protein AglD, as membrane flippases homologous to the flippase domain of bacterial multipeptide resistance factor MprF, a bifunctional lysylphosphatidylglycerol synthase and flippase. Additionally, several proteins were predicted to function as membrane transporters. The results of this work, combined with our previous analyses, reveal an unexpected diversity of putative archaeal membrane-associated functional systems that remain to be functionally characterized. A more general conclusion from this work is that the currently available collection of archaeal (and bacterial) genomes could be sufficient to identify (almost) all widespread functional modules and develop experimentally testable predictions of their functions.
[Show abstract][Hide abstract] ABSTRACT: Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics.
Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by US Government employees and is in the public domain in the US.
Full-text · Article · Nov 2014 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: Argonaute proteins are conserved throughout all domains of life. Recently characterized prokaryotic Argonaute proteins (pAgos) participate in host defense by DNA interference, whereas eukaryotic Argonaute proteins (eAgos) control a wide range of processes by RNA interference. Here we review molecular mechanisms of guide and target binding by Argonaute proteins, and describe how the conformational changes induced by target binding lead to target cleavage. On the basis of structural comparisons and phylogenetic analyses of pAgos and eAgos, we reconstruct the evolutionary journey of the Argonaute proteins through the three domains of life and discuss how different structural features of pAgos and eAgos relate to their distinct physiological roles.
[Show abstract][Hide abstract] ABSTRACT: Microbial genomes encompass a sizable fraction of poorly characterized, narrowly spread fast-evolving genes. Using sensitive methods for sequences comparison and protein structure prediction, we performed a detailed comparative analysis of clusters of such genes, which we denote “dark matter islands”, in archaeal genomes. The dark matter islands comprise up to 20 % of archaeal genomes and show remarkable heterogeneity and diversity. Nevertheless, three classes of entities are common in these genomic loci: (a) integrated viral genomes and other mobile elements; (b) defense systems, and (c) secretory and other membrane-associated systems. The dark matter islands in the genome of thermophiles and mesophiles show similar general trends of gene content, but thermophiles are substantially enriched in predicted membrane proteins whereas mesophiles have a greater proportion of recognizable mobile elements. Based on this analysis, we predict the existence of several novel groups of viruses and mobile elements, previously unnoticed variants of CRISPR-Cas immune systems, and new secretory systems that might be involved in stress response, intermicrobial conflicts and biogenesis of novel, uncharacterized membrane structures.
Electronic supplementary material
The online version of this article (doi:10.1007/s00792-014-0672-7) contains supplementary material, which is available to authorized users.
[Show abstract][Hide abstract] ABSTRACT: The elaborate eukaryotic DNA replication machinery evolved from the archaeal ancestors that themselves show considerable complexity. Here we discuss the comparative genomic and phylogenetic analysis of the core replication enzymes, the DNA polymerases, in archaea and their relationships with the eukaryotic polymerases. In archaea, there are three groups of family B DNA polymerases, historically known as PolB1, PolB2 and PolB3. All three groups appear to descend from the last common ancestors of the extant archaea but their subsequent evolutionary trajectories seem to have been widely different. Although PolB3 is present in all archaea, with the exception of Thaumarchaeota, and appears to be directly involved in lagging strand replication, the evolution of this gene does not follow the archaeal phylogeny, conceivably due to multiple horizontal transfers and/or dramatic differences in evolutionary rates. In contrast, PolB1 is missing in Euryarchaeota but otherwise seems to have evolved vertically. The third archaeal group of family B polymerases, PolB2, includes primarily proteins in which the catalytic centers of the polymerase and exonuclease domains are disrupted and accordingly the enzymes appear to be inactivated. The members of the PolB2 group are scattered across archaea and might be involved in repair or regulation of replication along with inactivated members of the RadA family ATPases and an additional, uncharacterized protein that are encoded within the same predicted operon. In addition to the family B polymerases, all archaea, with the exception of the Crenarchaeota, encode enzymes of a distinct family D the origin of which is unclear. We examine multiple considerations that appear compatible with the possibility that family D polymerases are highly derived homologs of family B. The eukaryotic DNA polymerases show a highly complex relationship with their archaeal ancestors including contributions of proteins and domains from both the family B and the fam
Full-text · Article · Jul 2014 · Frontiers in Microbiology
[Show abstract][Hide abstract] ABSTRACT: BACKGROUND:
Diverse transposable elements are abundant in genomes of cellular organisms from all three domains of life. Although transposons are often regarded as junk DNA, a growing body of evidence indicates that they are behind some of the major evolutionary innovations. With the growth in the number of diversity of sequenced genomes, previously unnoticed mobile elements continue to be discovered.
We describe a new superfamily of archaeal and bacterial mobile elements which we denote casposons because they encode Cas1 endonuclease, a key enzyme of the CRISPR-Cas adaptive immunity systems of archaea and bacteria. The casposons share several features with self-synthesizing eukaryotic DNA transposons of the Polinton/Maverick class, including terminal inverted repeats and genes for B family DNA polymerases. However, unlike any other known mobile elements, the casposons are predicted to rely on Cas1 for integration and excision, via a mechanism similar to the integration of new spacers into CRISPR loci. We identify three distinct families of casposons that differ in their gene repertoires and evolutionary provenance of the DNA polymerases. Deep branching of the casposon-encoded endonuclease in the Cas1 phylogeny suggests that casposons played a pivotal role in the emergence of CRISPR-Cas immunity.
The casposons are a novel superfamily of mobile elements, the first family of putative self-synthesizing transposons discovered in prokaryotes. The likely contribution of capsosons to the evolution of CRISPR-Cas parallels the involvement of the RAG1 transposase in vertebrate immunoglobulin gene rearrangement, suggesting that recruitment of endonucleases from mobile elements as ready-made tools for genome manipulation is a general route of evolution of adaptive immunity.