Eugene V Koonin

Publications

  • Universal pacemaker of genome evolution

    Sagi Snir, Yuri I. Wolf, Eugene V. Koonin

    04/2012;

    Molecular clock (MC) is a central concept of molecular evolution according to which each gene evolves at a characteristic, near constant rate. Numerous evolutionary studies have demonstrated the validity of MC but also have shown that MC is substantially overdispersed, i.e. lineage-specific deviatio... [more] Molecular clock (MC) is a central concept of molecular evolution according to which each gene evolves at a characteristic, near constant rate. Numerous evolutionary studies have demonstrated the validity of MC but also have shown that MC is substantially overdispersed, i.e. lineage-specific deviations of the evolutionary rate of the given gene from the clock greatly exceed the expectation from the sampling error. A fundamental observation of comparative genomics that appears to complement the MC is that the distribution of evolution rates across orthologous genes in pairs of related genomes remains virtually unchanged throughout the evolution of life, from bacteria to mammals. The conservation of this distribution implies that the relative evolution rates of all genes remain nearly constant, or in other words, that evolutionary rates of different genes are strongly correlated within each evolving genome. We hypothesized that this correlation is not a simple consequence of MC but could be better explained by a model we dubbed Universal PaceMaker (UPM) of genome evolution. The UPM model posits that the rate of evolution changes synchronously across genome-wide sets of genes in all evolving lineages. We sought to differentiate between the MC and UPM models by fitting thousands of phylogenetic trees for bacterial and archaeal genes to supertrees that reflect the dominant trend of vertical descent in the evolution of archaea and bacteria and that were constrained according to the two models. The goodness of fit for the UPM model was better than the fit for the MC model, with overwhelming statistical significance. These results reveal a universal pacemaker of genome evolution that could have been in operation throughout the history of life.
  • 3.32
    Impact points
    Origin and evolution of spliceosomal introns.

    Igor B Rogozin, Liran Carmel, Miklos Csuros, Eugene V Koonin

    Biology direct. 04/2012; 7(1):11.

    ABSTRACT: Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded 'introns first' held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evol... [more] ABSTRACT: Evolution of exon-intron structure of eukaryotic genes has been a matter of long-standing, intensive debate. The introns-early concept, later rebranded 'introns first' held that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. The introns-late concept held that introns emerged only in eukaryotes and new introns have been accumulating continuously throughout eukaryotic evolution. Analysis of orthologous genes from completely sequenced eukaryotic genomes revealed numerous shared intron positions in orthologous genes from animals and plants and even between animals, plants and protists, suggesting that many ancestral introns have persisted since the last eukaryotic common ancestor (LECA). Reconstructions of intron gain and loss using the growing collection of genomes of diverse eukaryotes and increasingly advanced probabilistic models convincingly show that the LECA and the ancestors of each eukaryotic supergroup had intron-rich genes, with intron densities comparable to those in the most intron-rich modern genomes such as those of vertebrates. The subsequent evolution in most lineages of eukaryotes involved primarily loss of introns, with only a few episodes of substantial intron gain that might have accompanied major evolutionary innovations such as the origin of metazoa. The original invasion of self-splicing Group II introns, presumably originating from the mitochondrial endosymbiont, into the genome of the emerging eukaryote might have been a key factor of eukaryogenesis that in particular triggered the origin of endomembranes and the nucleus. Conversely, splicing errors gave rise to alternative splicing, a major contribution to the biological complexity of multicellular eukaryotes. There is no indication that any prokaryote has ever possessed a spliceosome or introns in protein-coding genes, other than relatively rare mobile self-splicing introns. Thus, the introns-first scenario is not supported by any evidence but exon-intron structure of protein-coding genes appears to have evolved concomitantly with the eukaryotic cell, and introns were a major factor of evolution throughout the history of eukaryotes. This article was reviewed by I. King Jordan, Manuel Irimia (nominated by Anthony Poole), Tobias Mourier (nominated by Anthony Poole), and Fyodor Kondrashov. For the complete reports, see the Reviewers' Reports section.
  • 3.32
    Impact points
    Archaeal origin of tubulin.

    Natalya Yutin, Eugene V Koonin

    Biology direct. 03/2012; 7(1):10.

    ABSTRACT: Tubulins are a family of GTPases that are key components of the cytoskeleton in all eukaryotes and are distantly related to the FtsZ GTPase that is involved in cell division in most bacteria and many archaea. Among prokaryotes, bona fide tubulins have been identified only in bacteria of th... [more] ABSTRACT: Tubulins are a family of GTPases that are key components of the cytoskeleton in all eukaryotes and are distantly related to the FtsZ GTPase that is involved in cell division in most bacteria and many archaea. Among prokaryotes, bona fide tubulins have been identified only in bacteria of the genus Prosthecobacter. These bacterial tubulin genes appear to have been horizontally transferred from eukaryotes. Here we describe tubulins encoded in the genomes of thaumarchaeota of the genus Nitrosoarchaeum that we denote artubulins Phylogenetic analysis results are compatible with the origin of eukaryotic tubulins from artubulins. These findings expand the emerging picture of the origin of key components of eukaryotic functional systems from ancestral forms that are scattered among the extant archaea. This article was reviewed by Gaspar Jekely and J. Peter Gogarten.
  • 3.32
    Impact points
    The CMG (CDC45/RecJ, MCM, GINS) complex is a conserved component of the DNA replication system in all archaea and eukaryotes.

    Kira S Makarova, Eugene V Koonin, Zvi Kelman

    Biology direct. 02/2012; 7(1):7.

    ABSTRACT: In eukaryotes, the CMG (CDC45, MCM, GINS) complex containing the replicative helicase MCM is a key player in DNA replication. Archaeal homologs of the eukaryotic MCM and GINS proteins have been identified but until recently no homolog of the CDC45 protein was known. Two recent developments... [more] ABSTRACT: In eukaryotes, the CMG (CDC45, MCM, GINS) complex containing the replicative helicase MCM is a key player in DNA replication. Archaeal homologs of the eukaryotic MCM and GINS proteins have been identified but until recently no homolog of the CDC45 protein was known. Two recent developments, namely the discovery of archaeal GINS-associated nuclease (GAN) that belongs to the RecJ family of the DHH hydrolase superfamily and the demonstration of homology between the DHH domains of CDC45 and RecJ, show that at least some Archaea possess a full complement of homologs of the CMG complex subunits. Here we present the results of in-depth phylogenomic analysis of RecJ homologs in archaea. We confirm and extend the recent hypothesis that CDC45 is the eukaryotic ortholog of the bacterial and archaeal RecJ family nucleases. At least one RecJ homolog was identified in all sequenced archaeal genomes, with the single exception of Caldivirga maquilingensis. These proteins include previously unnoticed remote RecJ homologs with inactivated DHH domain in Thermoproteales. Combined with phylogenetic tree reconstruction of diverse eukaryotic, archaeal and bacterial DHH subfamilies, this analysis yields a complex scenario of RecJ family evolution in Archaea which includes independent inactivation of the nuclease domain in Crenarchaeota and Halobacteria, and loss of this domain in Methanococcales. The archaeal complex of a CDC45/RecJ homolog, MCM and GINS is homologous and most likely functionally analogous to the eukaryotic CMG complex, and appears to be a key component of the DNA replication machinery in all Archaea. It is inferred that the last common archaeo-eukaryotic ancestor encoded a CMG complex that contained an active nuclease of the RecJ family. The inactivated RecJ homologs in several archaeal lineages most likely are dedicated structural components of replication complexes. This article was reviewed by Prof. Patrick Forterre, Dr. Stephen John Aves (nominated by Dr. Purificacion Lopez-Garcia) and Prof. Martijn Huynen.For the full reviews, see the Reviewers' Comments section.
  • 9.43
    Impact points
    Origin of first cells at terrestrial, anoxic geothermal fields.

    Armen Y Mulkidjanian, Andrew Yu Bychkov, Daria V Dibrova, Michael Y Galperin, Eugene V Koonin

    Proceedings of the National Academy of Sciences of the United States of America. 02/2012; 109(14):E821-30.

    All cells contain much more potassium, phosphate, and transition metals than modern (or reconstructed primeval) oceans, lakes, or rivers. Cells maintain ion gradients by using sophisticated, energy-dependent membrane enzymes (membrane pumps) that are embedded in elaborate ion-tight membranes. The fi... [more] All cells contain much more potassium, phosphate, and transition metals than modern (or reconstructed primeval) oceans, lakes, or rivers. Cells maintain ion gradients by using sophisticated, energy-dependent membrane enzymes (membrane pumps) that are embedded in elaborate ion-tight membranes. The first cells could possess neither ion-tight membranes nor membrane pumps, so the concentrations of small inorganic molecules and ions within protocells and in their environment would equilibrate. Hence, the ion composition of modern cells might reflect the inorganic ion composition of the habitats of protocells. We attempted to reconstruct the "hatcheries" of the first cells by combining geochemical analysis with phylogenomic scrutiny of the inorganic ion requirements of universal components of modern cells. These ubiquitous, and by inference primordial, proteins and functional systems show affinity to and functional requirement for K(+), Zn(2+), Mn(2+), and phosphate. Thus, protocells must have evolved in habitats with a high K(+)/Na(+) ratio and relatively high concentrations of Zn, Mn, and phosphorous compounds. Geochemical reconstruction shows that the ionic composition conducive to the origin of cells could not have existed in marine settings but is compatible with emissions of vapor-dominated zones of inland geothermal systems. Under the anoxic, CO(2)-dominated primordial atmosphere, the chemistry of basins at geothermal fields would resemble the internal milieu of modern cells. The precellular stages of evolution might have transpired in shallow ponds of condensed and cooled geothermal vapor that were lined with porous silicate minerals mixed with metal sulfides and enriched in K(+), Zn(2+), and phosphorous compounds.
  • 4.41
    Impact points
    Phylogenomics of prokaryotic ribosomal proteins.

    Natalya Yutin, Pere Puigbò, Eugene V Koonin, Yuri I Wolf

    PloS one. 01/2012; 7(5):e36972.

    Archaeal and bacterial ribosomes contain more than 50 proteins, including 34 that are universally conserved in the three domains of cellular life (bacteria, archaea, and eukaryotes). Despite the high sequence conservation, annotation of ribosomal (r-) protein genes is often difficult because of thei... [more] Archaeal and bacterial ribosomes contain more than 50 proteins, including 34 that are universally conserved in the three domains of cellular life (bacteria, archaea, and eukaryotes). Despite the high sequence conservation, annotation of ribosomal (r-) protein genes is often difficult because of their short lengths and biased sequence composition. We developed an automated computational pipeline for identification of r-protein genes and applied it to 995 completely sequenced bacterial and 87 archaeal genomes available in the RefSeq database. The pipeline employs curated seed alignments of r-proteins to run position-specific scoring matrix (PSSM)-based BLAST searches against six-frame genome translations, mitigating possible gene annotation errors. As a result of this analysis, we performed a census of prokaryotic r-protein complements, enumerated missing and paralogous r-proteins, and analyzed the distributions of ribosomal protein genes among chromosomal partitions. Phyletic patterns of bacterial and archaeal r-protein genes were mapped to phylogenetic trees reconstructed from concatenated alignments of r-proteins to reveal the history of likely multiple independent gains and losses. These alignments, available for download, can be used as search profiles to improve genome annotation of r-proteins and for further comparative genomics studies.
  • Genome-wide comparative analysis of phylogenetic trees: the prokaryotic forest of life.

    Pere Puigbò, Yuri I Wolf, Eugene V Koonin

    Methods in molecular biology (Clifton, N.J.). 01/2012; 856:53-79.

    Genome-wide comparison of phylogenetic trees is becoming an increasingly common approach in evolutionary genomics, and a variety of approaches for such comparison have been developed. In this article, we present several methods for comparative analysis of large numbers of phylogenetic trees. To comp... [more] Genome-wide comparison of phylogenetic trees is becoming an increasingly common approach in evolutionary genomics, and a variety of approaches for such comparison have been developed. In this article, we present several methods for comparative analysis of large numbers of phylogenetic trees. To compare phylogenetic trees taking into account the bootstrap support for each internal branch, the Boot-Split Distance (BSD) method is introduced as an extension of the previously developed Split Distance method for tree comparison. The BSD method implements the straightforward idea that comparison of phylogenetic trees can be made more robust by treating tree splits differentially depending on the bootstrap support. Approaches are also introduced for detecting tree-like and net-like evolutionary trends in the phylogenetic Forest of Life (FOL), i.e., the entirety of the phylogenetic trees for conserved genes of prokaryotes. The principal method employed for this purpose includes mapping quartets of species onto trees to calculate the support of each quartet topology and so to quantify the tree and net contributions to the distances between species. We describe the application of these methods to analyze the FOL and the results obtained with these methods. These results support the concept of the Tree of Life (TOL) as a central evolutionary trend in the FOL as opposed to the traditional view of the TOL as a "species tree."
  • 3.94
    Impact points
    Nature and Intensity of Selection Pressure on CRISPR-Associated Genes.

    Nobuto Takeuchi, Yuri I Wolf, Kira S Makarova, Eugene V Koonin

    Journal of bacteriology. 12/2011; 194(5):1216-25.

    The recently discovered CRISPR-Cas adaptive immune system is present in almost all archaea and many bacteria. It consists of cassettes of CRISPR repeats that incorporate spacers homologous to fragments of viral or plasmid genomes that are employed as guide RNAs in the immune response, along with num... [more] The recently discovered CRISPR-Cas adaptive immune system is present in almost all archaea and many bacteria. It consists of cassettes of CRISPR repeats that incorporate spacers homologous to fragments of viral or plasmid genomes that are employed as guide RNAs in the immune response, along with numerous CRISPR-associated (cas) genes that encode proteins possessing diverse, only partially characterized activities required for the action of the system. Here, we investigate the evolution of the cas genes and show that they evolve under purifying selection that is typically much weaker than the median strength of purifying selection affecting genes in the respective genomes. The exceptions are the cas1 and cas2 genes that typically evolve at levels of purifying selection close to the genomic median. Thus, although these genes are implicated in the acquisition of spacers from alien genomes, they do not appear to be directly involved in an arms race between bacterial and archaeal hosts and infectious agents. These genes might possess functions distinct from and additional to their role in the CRISPR-Cas-mediated immune response. Taken together with evidence of the frequent horizontal transfer of cas genes reported previously and with the wide-spread microscale recombination within these genes detected in this work, these findings reveal the highly dynamic evolution of cas genes. This conclusion is in line with the involvement of CRISPR-Cas in antiviral immunity that is likely to entail a coevolutionary arms race with rapidly evolving viruses. However, we failed to detect evidence of strong positive selection in any of the cas genes.
  • 5.76
    Impact points
    Predictability of evolutionary trajectories in fitness landscapes.

    Alexander E Lobkovsky, Yuri I Wolf, Eugene V Koonin

    PLoS computational biology. 12/2011; 7(12):e1002302.

    Experimental studies on enzyme evolution show that only a small fraction of all possible mutation trajectories are accessible to evolution. However, these experiments deal with individual enzymes and explore a tiny part of the fitness landscape. We report an exhaustive analysis of fitness landscapes... [more] Experimental studies on enzyme evolution show that only a small fraction of all possible mutation trajectories are accessible to evolution. However, these experiments deal with individual enzymes and explore a tiny part of the fitness landscape. We report an exhaustive analysis of fitness landscapes constructed with an off-lattice model of protein folding where fitness is equated with robustness to misfolding. This model mimics the essential features of the interactions between amino acids, is consistent with the key paradigms of protein folding and reproduces the universal distribution of evolutionary rates among orthologous proteins. We introduce mean path divergence as a quantitative measure of the degree to which the starting and ending points determine the path of evolution in fitness landscapes. Global measures of landscape roughness are good predictors of path divergence in all studied landscapes: the mean path divergence is greater in smooth landscapes than in rough ones. The model-derived and experimental landscapes are significantly smoother than random landscapes and resemble additive landscapes perturbed with moderate amounts of noise; thus, these landscapes are substantially robust to mutation. The model landscapes show a deficit of suboptimal peaks even compared with noisy additive landscapes with similar overall roughness. We suggest that smoothness and the substantial deficit of peaks in the fitness landscapes of protein evolution are fundamental consequences of the physics of protein folding.
  • 9.43
    Impact points
    Displacement of the canonical single-stranded DNA-binding protein in the Thermoproteales.

    Sonia Paytubi, Stephen A McMahon, Shirley Graham, Huanting Liu, Catherine H Botting, Kira S Makarova, Eugene V Koonin, James H Naismith, Malcolm F White

    Proceedings of the National Academy of Sciences of the United States of America. 11/2011; 109(7):E398-405.

    ssDNA-binding proteins (SSBs) based on the oligonucleotide-binding fold are considered ubiquitous in nature and play a central role in many DNA transactions including replication, recombination, and repair. We demonstrate that the Thermoproteales, a clade of hyperthermophilic Crenarchaea, lack a can... [more] ssDNA-binding proteins (SSBs) based on the oligonucleotide-binding fold are considered ubiquitous in nature and play a central role in many DNA transactions including replication, recombination, and repair. We demonstrate that the Thermoproteales, a clade of hyperthermophilic Crenarchaea, lack a canonical SSB. Instead, they encode a distinct ssDNA-binding protein that we term "ThermoDBP," exemplified by the protein Ttx1576 from Thermoproteus tenax. ThermoDBP binds specifically to ssDNA with low sequence specificity. The crystal structure of Ttx1576 reveals a unique fold and a mechanism for ssDNA binding, consisting of an extended cleft lined with hydrophobic phenylalanine residues and flanked by basic amino acids. Two ssDNA-binding domains are linked by a coiled-coil leucine zipper. ThermoDBP appears to have displaced the canonical SSB during the diversification of the Thermoproteales, a highly unusual example of the loss of a "ubiquitous" protein during evolution.
  • Negative correlation between expression level and evolutionary rate of long intergenic noncoding RNAs.

    David Managadze, Igor B Rogozin, Diana Chernikova, Svetlana A Shabalina, Eugene V Koonin

    Genome biology and evolution. 11/2011; 3:1390-404.

    Mammalian genomes contain numerous genes for long noncoding RNAs (lncRNAs). The functions of the lncRNAs remain largely unknown but their evolution appears to be constrained by purifying selection, albeit relatively weakly. To gain insights into the mode of evolution and the functional range of the ... [more] Mammalian genomes contain numerous genes for long noncoding RNAs (lncRNAs). The functions of the lncRNAs remain largely unknown but their evolution appears to be constrained by purifying selection, albeit relatively weakly. To gain insights into the mode of evolution and the functional range of the lncRNA, they can be compared with much better characterized protein-coding genes. The evolutionary rate of the protein-coding genes shows a universal negative correlation with expression: highly expressed genes are on average more conserved during evolution than the genes with lower expression levels. This correlation was conceptualized in the misfolding-driven protein evolution hypothesis according to which misfolding is the principal cost incurred by protein expression. We sought to determine whether long intergenic ncRNAs (lincRNAs) follow the same evolutionary trend and indeed detected a moderate but statistically significant negative correlation between the evolutionary rate and expression level of human and mouse lincRNA genes. The magnitude of the correlation for the lincRNAs is similar to that for equal-sized sets of protein-coding genes with similar levels of sequence conservation. Additionally, the expression level of the lincRNAs is significantly and positively correlated with the predicted extent of lincRNA molecule folding (base-pairing), however, the contributions of evolutionary rates and folding to the expression level are independent. Thus, the anticorrelation between evolutionary rate and expression level appears to be a general feature of gene evolution that might be caused by similar deleterious effects of protein and RNA misfolding and/or other factors, for example, the number of interacting partners of the gene product.
  • Common Origins and Host-Dependent Diversity of Plant and Animal Viromes.

    Valerian V Dolja, Eugene V Koonin

    Current opinion in virology. 11/2011; 1(5):322-331.

    Many viruses infecting animals and plants share common cores of homologous genes involved in the key processes of viral replication. In contrast, genes that mediate virus - host interactions including in many cases capsid protein genes are markedly different. There are three distinct scenarios for t... [more] Many viruses infecting animals and plants share common cores of homologous genes involved in the key processes of viral replication. In contrast, genes that mediate virus - host interactions including in many cases capsid protein genes are markedly different. There are three distinct scenarios for the origin of related viruses of plants and animals: i) evolution from a common ancestral virus predating the divergence of plants and animals; ii) horizontal transfer of viruses, for example, through insect vectors; iii) parallel origin from related genetic elements. We present evidence that each of these scenarios contributed, to a varying extent, to the evolution of different groups of viruses.
  • 2.56
    Impact points
    Genomic and biological analysis of Grapevine leafroll-associated virus 7 reveals a possible new genus within the family Closteroviridae.

    Maher Al Rwahnih, Valerian V Dolja, Steve Daubert, Eugene V Koonin, Adib Rowhani

    Virus research. 10/2011; 163(1):302-9.

    Deep sequencing analysis of an asymptomatic grapevine revealed a virome containing five RNA viruses and a viroid. Of these, Grapevine leafroll-associated virus 7 (GLRaV-7), an unassigned closterovirus, was by far the most prominently represented sequence in the analysis. Graft-inoculation of the inf... [more] Deep sequencing analysis of an asymptomatic grapevine revealed a virome containing five RNA viruses and a viroid. Of these, Grapevine leafroll-associated virus 7 (GLRaV-7), an unassigned closterovirus, was by far the most prominently represented sequence in the analysis. Graft-inoculation of the infection to another grape variety confirmed the lack of the leafroll disease symptoms, even though GLRaV-7 could be detected in the inoculated indicator plants. A 16,496 nucleotide-long genomic sequence of this virus was determined from the deep sequencing data. Its genome architecture and the sequences encoding its nine predicted proteins were compared with those of other closteroviruses. The comparison revealed that two other viruses, Little cherry virus-1 and Cordyline virus-1 formed a well supported phylogenetic cluster with GLRaV-7.
  • 3.94
    Impact points
    Defense islands in bacterial and archaeal genomes and prediction of novel defense systems.

    Kira S Makarova, Yuri I Wolf, Sagi Snir, Eugene V Koonin

    Journal of bacteriology. 09/2011; 193(21):6039-56.

    The arms race between cellular life forms and viruses is a major driving force of evolution. A substantial fraction of bacterial and archaeal genomes is dedicated to antivirus defense. We analyzed the distribution of defense genes and typical mobilome components (such as viral and transposon genes) ... [more] The arms race between cellular life forms and viruses is a major driving force of evolution. A substantial fraction of bacterial and archaeal genomes is dedicated to antivirus defense. We analyzed the distribution of defense genes and typical mobilome components (such as viral and transposon genes) in bacterial and archaeal genomes and demonstrated statistically significant clustering of antivirus defense systems and mobile genes and elements in genomic islands. The defense islands are enriched in putative operons and contain numerous overrepresented gene families. A detailed sequence analysis of the proteins encoded by genes in these families shows that many of them are diverged variants of known defense system components, whereas others show features, such as characteristic operonic organization, that are suggestive of novel defense systems. Thus, genomic islands provide abundant material for the experimental study of bacterial and archaeal antivirus defense. Except for the CRISPR-Cas systems, different classes of defense systems, in particular toxin-antitoxin and restriction-modification systems, show nonrandom clustering in defense islands. It remains unclear to what extent these associations reflect functional cooperation between different defense systems and to what extent the islands are genomic "sinks" that accumulate diverse nonessential genes, particularly those acquired via horizontal gene transfer. The characteristics of defense islands resemble those of mobilome islands. Defense and mobilome genes are nonrandomly associated in islands, suggesting nonadaptive evolution of the islands via a preferential attachment-like mechanism underpinned by the addictive properties of defense systems such as toxins-antitoxins and an important role of horizontal mobility in the evolution of these islands.
  • 5.76
    Impact points
    A detailed history of intron-rich eukaryotic ancestors inferred from a global survey of 100 complete genomes.

    Miklos Csuros, Igor B Rogozin, Eugene V Koonin

    PLoS computational biology. 09/2011; 7(9):e1002150.

    Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6-7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-p... [more] Protein-coding genes in eukaryotes are interrupted by introns, but intron densities widely differ between eukaryotic lineages. Vertebrates, some invertebrates and green plants have intron-rich genes, with 6-7 introns per kilobase of coding sequence, whereas most of the other eukaryotes have intron-poor genes. We reconstructed the history of intron gain and loss using a probabilistic Markov model (Markov Chain Monte Carlo, MCMC) on 245 orthologous genes from 99 genomes representing the three of the five supergroups of eukaryotes for which multiple genome sequences are available. Intron-rich ancestors are confidently reconstructed for each major group, with 53 to 74% of the human intron density inferred with 95% confidence for the Last Eukaryotic Common Ancestor (LECA). The results of the MCMC reconstruction are compared with the reconstructions obtained using Maximum Likelihood (ML) and Dollo parsimony methods. An excellent agreement between the MCMC and ML inferences is demonstrated whereas Dollo parsimony introduces a noticeable bias in the estimations, typically yielding lower ancestral intron densities than MCMC and ML. Evolution of eukaryotic genes was dominated by intron loss, with substantial gain only at the bases of several major branches including plants and animals. The highest intron density, 120 to 130% of the human value, is inferred for the last common ancestor of animals. The reconstruction shows that the entire line of descent from LECA to mammals was intron-rich, a state conducive to the evolution of alternative splicing.
  • 5.13
    Impact points
    Planctomycetes and eukaryotes: a case of analogy not homology.

    James O McInerney, William F Martin, Eugene V Koonin, John F Allen, Michael Y Galperin, Nick Lane, John M Archibald, T Martin Embley

    BioEssays : news and reviews in molecular, cellular and developmental biology. 08/2011; 33(11):810-7.

    Planctomycetes, Verrucomicrobia and Chlamydia are prokaryotic phyla, sometimes grouped together as the PVC superphylum of eubacteria. Some PVC species possess interesting attributes, in particular, internal membranes that superficially resemble eukaryotic endomembranes. Some biologists now claim tha... [more] Planctomycetes, Verrucomicrobia and Chlamydia are prokaryotic phyla, sometimes grouped together as the PVC superphylum of eubacteria. Some PVC species possess interesting attributes, in particular, internal membranes that superficially resemble eukaryotic endomembranes. Some biologists now claim that PVC bacteria are nucleus-bearing prokaryotes and are considered evolutionary intermediates in the transition from prokaryote to eukaryote. PVC prokaryotes do not possess a nucleus and are not intermediates in the prokaryote-to-eukaryote transition. Here we summarise the evidence that shows why all of the PVC traits that are currently cited as evidence for aspiring eukaryoticity are either analogous (the result of convergent evolution), not homologous, to eukaryotic traits; or else they are the result of horizontal gene transfers.
  • 3.04
    Impact points
    Vaccinia virus F16 protein, a predicted catalytically inactive member of the prokaryotic serine recombinase superfamily, is targeted to nucleoli.

    Tatiana G Senkevich, Eugene V Koonin, Bernard Moss

    Virology. 07/2011; 417(2):334-42.

    The F16L gene of vaccinia virus (VACV) is conserved in all chordopoxviruses except avipoxviruses. The crocodile poxvirus F16 protein ortholog has highly significant similarity to prokaryotic serine recombinases and contains all amino acids that comprise the catalytic site. In contrast, F16 orthologs... [more] The F16L gene of vaccinia virus (VACV) is conserved in all chordopoxviruses except avipoxviruses. The crocodile poxvirus F16 protein ortholog has highly significant similarity to prokaryotic serine recombinases and contains all amino acids that comprise the catalytic site. In contrast, F16 orthologs encoded by other poxviruses show only marginally significant similarity to serine recombinases, lack essential amino acids of the active site and are most likely inactive derivatives of serine recombinases. Nevertheless, the conservation of F16L in non-avian poxviruses suggested an important function. However, a VACV mutant with the F16L gene knocked out replicated normally in dividing and quiescent cells. The F16 protein was synthesized early after infection and detected in virus cores. When expressed in infected or uninfected cells, F16 accumulated in nucleoli depending on the level of expression and confluency of cells. Evidence was obtained that F16 forms multimers, which might regulate concentration-dependent intracellular localization.
  • Viruses with more than 1,000 genes: Mamavirus, a new Acanthamoeba polyphaga mimivirus strain, and reannotation of Mimivirus genes.

    Philippe Colson, Natalya Yutin, Svetlana A Shabalina, Catherine Robert, Ghislain Fournous, Bernard La Scola, Didier Raoult, Eugene V Koonin

    Genome biology and evolution. 06/2011; 3:737-42.

    The genome sequence of the Mamavirus, a new Acanthamoeba polyphaga mimivirus strain, is reported. With 1,191,693 nt in length and 1,023 predicted protein-coding genes, the Mamavirus has the largest genome among the known viruses. The genomes of the Mamavirus and the previously described Mimivirus ar... [more] The genome sequence of the Mamavirus, a new Acanthamoeba polyphaga mimivirus strain, is reported. With 1,191,693 nt in length and 1,023 predicted protein-coding genes, the Mamavirus has the largest genome among the known viruses. The genomes of the Mamavirus and the previously described Mimivirus are highly similar in both the protein-coding genes and the intergenic regions. However, the Mamavirus contains an extra 5'-terminal segment that encompasses primarily disrupted duplicates of genes present elsewhere in the genome. The Mamavirus also has several unique genes including a small regulatory polyA polymerase subunit that is shared with poxviruses. Detailed analysis of the protein sequences of the two Mimiviruses led to a substantial amendment of the functional annotation of the viral genomes.
  • 7.33
    Impact points
    Computational methods for Gene Orthology inference.

    David M Kristensen, Yuri I Wolf, Arcady R Mushegian, Eugene V Koonin

    Briefings in bioinformatics. 06/2011; 12(5):379-91.

    Accurate inference of orthologous genes is a pre-requisite for most comparative genomics studies, and is also important for functional annotation of new genomes. Identification of orthologous gene sets typically involves phylogenetic tree analysis, heuristic algorithms based on sequence conservation... [more] Accurate inference of orthologous genes is a pre-requisite for most comparative genomics studies, and is also important for functional annotation of new genomes. Identification of orthologous gene sets typically involves phylogenetic tree analysis, heuristic algorithms based on sequence conservation, synteny analysis, or some combination of these approaches. The most direct tree-based methods typically rely on the comparison of an individual gene tree with a species tree. Once the two trees are accurately constructed, orthologs are straightforwardly identified by the definition of orthology as those homologs that are related by speciation, rather than gene duplication, at their most recent point of origin. Although ideal for the purpose of orthology identification in principle, phylogenetic trees are computationally expensive to construct for large numbers of genes and genomes, and they often contain errors, especially at large evolutionary distances. Moreover, in many organisms, in particular prokaryotes and viruses, evolution does not appear to have followed a simple 'tree-like' mode, which makes conventional tree reconciliation inapplicable. Other, heuristic methods identify probable orthologs as the closest homologous pairs or groups of genes in a set of organisms. These approaches are faster and easier to automate than tree-based methods, with efficient implementations provided by graph-theoretical algorithms enabling comparisons of thousands of genomes. Comparisons of these two approaches show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances. Synteny also can aid in identification of orthologs. Often, tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods.
  • 17.64
    Impact points
    Evolution and classification of the CRISPR-Cas systems.

    Kira S Makarova, Daniel H Haft, Rodolphe Barrangou, Stan J J Brouns, Emmanuelle Charpentier, Philippe Horvath, Sylvain Moineau, Francisco J M Mojica, Yuri I Wolf, Alexander F Yakunin, John van der Oost, Eugene V Koonin

    Nature reviews. Microbiology. 06/2011; 9(6):467-77.

    The CRISPR-Cas (clustered regularly interspaced short palindromic repeats-CRISPR-associated proteins) modules are adaptive immunity systems that are present in many archaea and bacteria. These defence systems are encoded by operons that have an extraordinarily diverse architecture and a high rate of... [more] The CRISPR-Cas (clustered regularly interspaced short palindromic repeats-CRISPR-associated proteins) modules are adaptive immunity systems that are present in many archaea and bacteria. These defence systems are encoded by operons that have an extraordinarily diverse architecture and a high rate of evolution for both the cas genes and the unique spacer content. Here, we provide an updated analysis of the evolutionary relationships between CRISPR-Cas systems and Cas proteins. Three major types of CRISPR-Cas system are delineated, with a further division into several subtypes and a few chimeric variants. Given the complexity of the genomic architectures and the extremely dynamic evolution of the CRISPR-Cas systems, a unified classification of these systems should be based on multiple criteria. Accordingly, we propose a 'polythetic' classification that integrates the phylogenies of the most common cas genes, the sequence and organization of the CRISPR repeats and the architecture of the CRISPR-cas loci.
1 2 3 4 ... 21 Next »

Following (49)

412
Publications
57
Followers