[show abstract][hide abstract] ABSTRACT: The human bacterial pathogen Listeria monocytogenes is emerging as a model organism to study RNA-mediated regulation in pathogenic bacteria. A class of non-coding RNAs called CRISPRs (clustered regularly interspaced short palindromic repeats) has been described to confer bacterial resistance against invading bacteriophages and conjugative plasmids. CRISPR function relies on the activity of CRISPR associated (cas) genes that encode a large family of proteins with nuclease or helicase activities and DNA and RNA binding domains. Here, we characterized a CRISPR element (RliB) that is expressed and processed in the L. monocytogenes strain EGD-e, which is completely devoid of cas genes. Structural probing revealed that RliB has an unexpected secondary structure comprising basepair interactions between the repeats and the adjacent spacers in place of canonical hairpins formed by the palindromic repeats. Moreover, in contrast to other CRISPR-Cas systems identified in Listeria, RliB-CRISPR is ubiquitously present among Listeria genomes at the same genomic locus and is never associated with the cas genes. We showed that RliB-CRISPR is a substrate for the endogenously encoded polynucleotide phosphorylase (PNPase) enzyme. The spacers of the different Listeria RliB-CRISPRs share many sequences with temperate and virulent phages. Furthermore, we show that a cas-less RliB-CRISPR lowers the acquisition frequency of a plasmid carrying the matching protospacer, provided that trans encoded cas genes of a second CRISPR-Cas system are present in the genome. Importantly, we show that PNPase is required for RliB-CRISPR mediated DNA interference. Altogether, our data reveal a yet undescribed CRISPR system whose both processing and activity depend on PNPase, highlighting a new and unexpected function for PNPase in "CRISPRology".
[show abstract][hide abstract] ABSTRACT: In prokaryotes, genome size is associated with metabolic versatility, regulatory complexity, effective population size and horizontal transfer rates. We therefore analyzed the co-variation of genome size and operon conservation to assess the evolutionary models of operon formation and maintenance. In agreement with previous results, intra-operonic pairs of essential and of highly expressed genes are more conserved. Interestingly, intra-operonic pairs of genes are also more conserved when they encode proteins at similar cell concentrations, suggesting a role of co-transcription in diminishing the cost of waste and shortfall in gene expression. Larger genomes have fewer and smaller operons that are also less conserved. Importantly, lower conservation in larger genomes was observed for all classes of operons in terms of gene expression, essentiality and balanced protein concentration. We reached very similar conclusions in independent analyses of three major bacterial clades (α- and β-Proteobacteria and Firmicutes). Operon conservation is inversely correlated to the abundance of transcription factors in the genome when controlled for genome size. This suggests a negative association between the complexity of genetic networks and operon conservation. These results show that genome size and/or its proxies are key determinants of the intensity of natural selection for operon organization. Our data fits better the evolutionary models based on the advantage of co-regulation than those based on genetic linkage or stochastic gene expression. We suggest that larger genomes with highly complex genetic networks and many transcription factors endure weaker selection for operons than smaller genomes with fewer alternative tools for genetic regulation.
Genome Biology and Evolution 11/2013; · 4.76 Impact Factor
[show abstract][hide abstract] ABSTRACT: Phages, like many parasites, tend to have small genomes and may encode autonomous functions or manipulate those of their hosts'. Recombination functions are essential for phage replication and diversification. They are also nearly ubiquitous in bacteria. The E. coli genome encodes many copies of an octamer (Chi) motif that upon recognition by RecBCD favors repair of double strand breaks by homologous recombination. This might allow self from non-self discrimination because RecBCD degrades DNA lacking Chi. Bacteriophage Lambda, an E. coli parasite, lacks Chi motifs, but escapes degradation by inhibiting RecBCD and encoding its own autonomous recombination machinery. We found that only half of 275 lambdoid genomes encode recombinases, the remaining relying on the host's machinery. Unexpectedly, we found that some lambdoid phages contain extremely high numbers of Chi motifs concentrated between the phage origin of replication and the packaging site. This suggests a tight association between replication, packaging and RecBCD-mediated recombination in these phages. Indeed, phages lacking recombinases strongly over-represent Chi motifs. Conversely, phages encoding recombinases and inhibiting host recombination machinery select for the absence of Chi motifs. Host and phage recombinases use different mechanisms and the latter are more tolerant to sequence divergence. Accordingly, we show that phages encoding their own recombination machinery have more mosaic genomes resulting from recent recombination events and have more diverse gene repertoires, i.e. larger pan genomes. We discuss the costs and benefits of superseding or manipulating host recombination functions and how this decision shapes phage genome structure and evolvability.
[show abstract][hide abstract] ABSTRACT: Quorum sensing (QS) regulates the onset of bacterial social responses in function to cell density having an important impact in virulence. AI-2 (autoinducer 2) is a signal that has the peculiarity of mediating both intra-and interspecies bacterial QS. We analyzed the diversity of all components of AI-2 quorum sensing across 44 complete genomes of E. coli and Shigella strains. We used phylogenetic tools to study its evolution and determined the phenotypes of single deletion mutants to predict phenotypes of natural strains. Our analysis revealed many likely adaptive polymorphisms both in gene content and nucleotide sequence. We show that all natural strains possess the signal emitter (the luxS gene) but many lack a functional signal receptor (complete lsr operon) and the ability to regulate extracellular signal concentrations. This result is in striking contrast with the canonical species-specific QS systems where one often finds orphan receptors, without a cognate synthase, but not orphan emitters. Our analysis indicates that selection actively maintains a balanced polymorphism for the presence/absence of a functional lsr operon suggesting diversifying selection on the regulation of signal accumulation and recognition. These results can be explained either by niche specific adaptation, or by selection for a coercive behavior where signal-blind emitters benefit from forcing other individuals in the population to haste in cooperative behaviors.
Genome Biology and Evolution 12/2012; · 4.76 Impact Factor
[show abstract][hide abstract] ABSTRACT: Clustered, regularly interspaced, short palindromic repeats (CRISPRs) are implicated in the defence against foreign DNA in various archaea and bacterial species. They have also been associated with slower spread of antibiotic resistance. However, experimental and evolutionary studies raise doubts about the role of CRISPRs as a sort of immune system in Escherichia coli. We studied a collection of 263 natural E. coli isolates from human and animal hosts, representative of the phylogenetic and lifestyle diversity of the species and exhibiting various levels of plasmid-encoded antibiotic resistance. We characterised the strains in terms of CRISPRs, performed replicon typing of the plasmids and tested for class 1 integrons to explore the possible association between CRISPRs and the absence of plasmids and mobile antibiotic resistance determinants. We found no meaningful association between the presence/absence of the cas genes, reflecting the activity of the CRISPRs, and the presence of plasmids, integrons or antibiotic resistance. No CRISPR in the collection contained a spacer matching antibiotic resistance gene nor element involved in antibiotic resistance gene mobilisation and 79.8% (210/263) of the strains lacked spacers matching sequences in the 2282 plasmid genomes available. Hence, E. coli CRISPRs do not seem to be efficient barriers to the spread of plasmids and antibiotic resistance, consistent with what has been reported for phages, and contrary to reports concerning other species.
[show abstract][hide abstract] ABSTRACT: Genetic exchange by conjugation is responsible for the spread of resistance, virulence, and social traits among prokaryotes. Recent works unraveled the functioning of the underlying type IV secretion systems (T4SS) and its distribution and recruitment for other biological processes (exaptation), notably pathogenesis. We analyzed the phylogeny of key conjugation proteins to infer the evolutionary history of conjugation and T4SS. We show that single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA) conjugation, while both based on a key AAA(+) ATPase, diverged before the last common ancestor of bacteria. The two key ATPases of ssDNA conjugation are monophyletic, having diverged at an early stage from dsDNA translocases. Our data suggest that ssDNA conjugation arose first in diderm bacteria, possibly Proteobacteria, and then spread to other bacterial phyla, including bacterial monoderms and Archaea. Identifiable T4SS fall within the eight monophyletic groups, determined by both taxonomy and structure of the cell envelope. Transfer to monoderms might have occurred only once, but followed diverse adaptive paths. Remarkably, some Firmicutes developed a new conjugation system based on an atypical relaxase and an ATPase derived from a dsDNA translocase. The observed evolutionary rates and patterns of presence/absence of specific T4SS proteins show that conjugation systems are often and independently exapted for other functions. This work brings a natural basis for the classification of all kinds of conjugative systems, thus tackling a problem that is growing as fast as genomic databases. Our analysis provides the first global picture of the evolution of conjugation and shows how a self-transferrable complex multiprotein system has adapted to different taxa and often been recruited by the host. As conjugation systems became specific to certain clades and cell envelopes, they may have biased the rate and direction of gene transfer by conjugation within prokaryotes.
Molecular Biology and Evolution 09/2012; · 10.35 Impact Factor
[show abstract][hide abstract] ABSTRACT: Type 3 secretion systems (T3SSs) are essential components of two complex bacterial machineries: the flagellum, which drives cell motility, and the non-flagellar T3SS (NF-T3SS), which delivers effectors into eukaryotic cells. Yet the origin, specialization, and diversification of these machineries remained unclear. We developed computational tools to identify homologous components of the two systems and to discriminate between them. Our analysis of >1,000 genomes identified 921 T3SSs, including 222 NF-T3SSs. Phylogenomic and comparative analyses of these systems argue that the NF-T3SS arose from an exaptation of the flagellum, i.e. the recruitment of part of the flagellum structure for the evolution of the new protein delivery function. This reconstructed chronology of the exaptation process proceeded in at least two steps. An intermediate ancestral form of NF-T3SS, whose descendants still exist in Myxococcales, lacked elements that are essential for motility and included a subset of NF-T3SS features. We argue that this ancestral version was involved in protein translocation. A second major step in the evolution of NF-T3SSs occurred via recruitment of secretins to the NF-T3SS, an event that occurred at least three times from different systems. In rhizobiales, a partial homologous gene replacement of the secretin resulted in two genes of complementary function. Acquisition of a secretin was followed by the rapid adaptation of the resulting NF-T3SSs to multiple, distinct eukaryotic cell envelopes where they became key in parasitic and mutualistic associations between prokaryotes and eukaryotes. Our work elucidates major steps of the evolutionary scenario leading to extant NF-T3SSs. It demonstrates how molecular evolution can convert one complex molecular machine into a second, equally complex machine by successive deletions, innovations, and recruitment from other molecular systems.
[show abstract][hide abstract] ABSTRACT: Despite increasing interest in coagulase-negative staphylococci (CoNS), little information is available about their bacteriophages. We isolated and sequenced three novel temperate Siphoviridae phages (StB12, StB27, and StB20) from the CoNS Staphylococcus hominis and S. capitis species. The genome sizes are around 40 kb, and open reading frames (ORFs) are arranged in functional modules encoding lysogeny, DNA metabolism, morphology, and cell lysis. Bioinformatics analysis allowed us to assign a potential function to half of the predicted proteins. Structural elements were further identified by proteomic analysis of phage particles, and DNA-packaging mechanisms were determined. Interestingly, the three phages show identical integration sites within their host genomes. In addition to this experimental characterization, we propose a novel classification based on the analysis of 85 phage and prophage genomes, including 15 originating from CoNS. Our analysis established 9 distinct clusters and revealed close relationships between S. aureus and CoNS phages. Genes involved in DNA metabolism and lysis and potentially in phage-host interaction appear to be widespread, while structural genes tend to be cluster specific. Our findings support the notion of a possible reciprocal exchange of genes between phages originating from S. aureus and CoNS, which may be of crucial importance for pathogenesis in staphylococci.
Journal of bacteriology 08/2012; 194(21):5829-39. · 3.94 Impact Factor
[show abstract][hide abstract] ABSTRACT: Secretins form large multimeric complexes in the outer membranes of many Gram-negative bacteria, where they function as dedicated gateways that allow proteins to access the extracellular environment. Despite their overall relatedness, different secretins use different specific and general mechanisms for their targeting, assembly, and membrane insertion. We report that all tested secretins from several type II secretion systems and from the filamentous bacteriophage f1 can spontaneously multimerize and insert into liposomes in an in vitro transcription-translation system. Phylogenetic analyses indicate that these secretins form a group distinct from the secretins of the type IV piliation and type III secretion systems, which do not autoassemble in vitro. A mutation causing a proline-to-leucine substitution allowed PilQ secretins from two different type IV piliation systems to assemble in vitro, albeit with very low efficiency, suggesting that autoassembly is an inherent property of all secretins.
Journal of bacteriology 07/2012; 194(18):4951-8. · 3.94 Impact Factor
[show abstract][hide abstract] ABSTRACT: Many of the most virulent bacterial pathogens show low genetic diversity and sexual isolation. Accordingly, Mycobacterium tuberculosis, the deadliest human pathogen, is thought to be clonal and evolve by genetic drift. Yet, its genome shows few of the concomitant signs of genome degradation. We analyzed 24 genomes and found an excess of genetic diversity in regions encoding key adaptive functions including the type VII secretion system and the ancient horizontally transferred virulence-related regions. Four different approaches showed evident signs of recombination in M. tuberculosis. Recombination tracts add a high density of polymorphisms, and many are thus predicted to arise from outside the clade. Some of these tracts match Mycobacterium canettii sequences. Recombination introduced an excess of non-synonymous diversity in general and even more in genes expected to be under positive or diversifying selection, e.g., cell wall component genes. Mutations leading to non-synonymous SNPs are effectively purged in MTBC, which shows dominance of purifying selection. MTBC mutation bias toward AT nucleotides is not compensated by biased gene conversion, suggesting the action of natural selection also on synonymous changes. Together, all of these observations point to a strong imprint of recombination and selection in the genome affecting both non-synonymous and synonymous positions. Hence, contrary to some other pathogens and previous proposals concerning M. tuberculosis, this lineage may have come out of its ancestral bottleneck as a very successful pathogen that is rapidly diversifying by the action of mutation, recombination, and natural selection.
Genome Research 02/2012; 22(4):721-34. · 14.40 Impact Factor
[show abstract][hide abstract] ABSTRACT: Many studies have been devoted to understand the mechanisms used by pathogenic bacteria to exploit human hosts. These mechanisms are very diverse in the detail, but share commonalities whose quantification should enlighten the evolution of virulence from both a molecular and an ecological perspective. We mined the literature for experimental data on infectious dose of bacterial pathogens in humans (ID50) and also for traits with which ID50 might be associated. These compilations were checked and complemented with genome analyses. We observed that ID50 varies in a continuous way by over 10 orders of magnitude. Low ID50 values are very strongly associated with the capacity of the bacteria to kill professional phagocytes or to survive in the intracellular milieu of these cells. Inversely, high ID50 values are associated with motile and fast-growing bacteria that use quorum-sensing based regulation of virulence factors expression. Infectious dose is not associated with genome size and shows insignificant phylogenetic inertia, in line with frequent virulence shifts associated with the horizontal gene transfer of a small number of virulence factors. Contrary to previous proposals, infectious dose shows little dependence on contact-dependent secretion systems and on the natural route of exposure. When all variables are combined, immune subversion and quorum-sensing are sufficient to explain two thirds of the variance in infectious dose. Our results show the key role of immune subversion in effective human infection by small bacterial populations. They also suggest that cooperative processes might be important for successful infection by bacteria with high ID50. Our results suggest that trade-offs between selection for population growth-related traits and selection for the ability to subvert the immune system shape bacterial infectiousness. Understanding these trade-offs provides guidelines to study the evolution of virulence and in particular the micro-evolutionary paths of emerging pathogens.
[show abstract][hide abstract] ABSTRACT: Proteins secreted to the extracellular environment or to the periphery of the cell envelope, the secretome, play essential roles in foraging, antagonistic and mutualistic interactions. We hypothesize that arms races, genetic conflicts and varying selective pressures should lead to the rapid change of sequences and gene repertoires of the secretome. The analysis of 42 bacterial pan-genomes shows that secreted, and especially extracellular proteins, are predominantly encoded in the accessory genome, i.e. among genes not ubiquitous within the clade. Genes encoding outer membrane proteins might engage more frequently in intra-chromosomal gene conversion because they are more often in multi-genic families. The gene sequences encoding the secretome evolve faster than the rest of the genome and in particular at non-synonymous positions. Cell wall proteins in Firmicutes evolve particularly fast when compared with outer membrane proteins of Proteobacteria. Virulence factors are over-represented in the secretome, notably in outer membrane proteins, but cell localization explains more of the variance in substitution rates and gene repertoires than sequence homology to known virulence factors. Accordingly, the repertoires and sequences of the genes encoding the secretome change fast in the clades of obligatory and facultative pathogens and also in the clades of mutualists and free-living bacteria. Our study shows that cell localization shapes genome evolution. In agreement with our hypothesis, the repertoires and the sequences of genes encoding secreted proteins evolve fast. The particularly rapid change of extracellular proteins suggests that these public goods are key players in bacterial adaptation.
PLoS ONE 01/2012; 7(11):e49403. · 3.73 Impact Factor
[show abstract][hide abstract] ABSTRACT: Proteins evolve at very different rates and, most notably, at rates inversely proportional to the level at which they are produced. The relative frequency of highly expressed proteins in the proteome, and thus their impact on the cell budget, increases steeply with growth rate. The maximal growth rate is a key life-history trait reflecting trade-offs between rapid growth and other fitness components. We show that the maximal growth rate is weakly affected by genetic drift. The negative correlation between protein expression levels and evolutionary rate and the positive correlation between expression levels of highly expressed proteins and growth rates, suggest that investment in growth affects the evolutionary rate of proteins, especially the highly expressed ones. Accordingly, analysis of 61 families of orthologs in 74 proteobacteria shows that differences in evolutionary rates between lowly and highly expressed proteins depend on maximal growth rates. Analyses of complexes with key roles in bacterial growth and strikingly different expression levels, the ribosome and the replisome, confirm these patterns and suggest that the growth-related sequence conservation is associated with protein synthesis. Maximal growth rates also shape protein evolution in the other bacterial clades. Long-branch attractions associated with this effect might explain why clades with persistent history of slow growth are attracted to the root when the tree of prokaryotes is inferred using highly, but not lowly, expressed proteins. These results indicate that reconstruction of deep phylogenies can be strongly affected by maximal growth rates, and highlight the importance of life-history traits and their physiological consequences for protein evolution.
Proceedings of the National Academy of Sciences 11/2011; 108(50):20030-5. · 9.74 Impact Factor
[show abstract][hide abstract] ABSTRACT: Members of the genus Flavobacterium occur in a variety of ecological niches and represent an interesting diversity of lifestyles. Flavobacterium branchiophilum is the main causative agent of bacterial gill disease, a severe condition affecting various cultured freshwater fish species worldwide, in particular salmonids in Canada and Japan. We report here the complete genome sequence of strain FL-15 isolated from a diseased sheatfish (Silurus glanis) in Hungary. The analysis of the F. branchiophilum genome revealed putative mechanisms of pathogenicity strikingly different from those of the other, closely related fish pathogen Flavobacterium psychrophilum, including the first cholera-like toxin in a non-Proteobacteria and a wealth of adhesins. The comparison with available genomes of other Flavobacterium species revealed a small genome size, large differences in chromosome organization, and fewer rRNA and tRNA genes, in line with its more fastidious growth. In addition, horizontal gene transfer shaped the evolution of F. branchiophilum, as evidenced by its virulence factors, genomic islands, and CRISPR (clustered regularly interspaced short palindromic repeats) systems. Further functional analysis should help in the understanding of host-pathogen interactions and in the development of rational diagnostic tools and control strategies in fish farms.
Applied and environmental microbiology 09/2011; 77(21):7656-62. · 3.69 Impact Factor
[show abstract][hide abstract] ABSTRACT: Horizontal gene transfer shapes the genomes of prokaryotes by allowing rapid acquisition of novel adaptive functions. Conjugation allows the broadest range and the highest gene transfer input per transfer event. While conjugative plasmids have been studied for decades, the number and diversity of integrative conjugative elements (ICE) in prokaryotes remained unknown. We defined a large set of protein profiles of the conjugation machinery to scan over 1,000 genomes of prokaryotes. We found 682 putative conjugative systems among all major phylogenetic clades and showed that ICEs are the most abundant conjugative elements in prokaryotes. Nearly half of the genomes contain a type IV secretion system (T4SS), with larger genomes encoding more conjugative systems. Surprisingly, almost half of the chromosomal T4SS lack co-localized relaxases and, consequently, might be devoted to protein transport instead of conjugation. This class of elements is preponderant among small genomes, is less commonly associated with integrases, and is rarer in plasmids. ICEs and conjugative plasmids in proteobacteria have different preferences for each type of T4SS, but all types exist in both chromosomes and plasmids. Mobilizable elements outnumber self-conjugative elements in both ICEs and plasmids, which suggests an extensive use of T4SS in trans. Our evolutionary analysis indicates that switch of plasmids to and from ICEs were frequent and that extant elements began to differentiate only relatively recently. According to the present results, ICEs are the most abundant conjugative elements in practically all prokaryotic clades and might be far more frequently domesticated into non-conjugative protein transport systems than previously thought. While conjugative plasmids and ICEs have different means of genomic stabilization, their mechanisms of mobility by conjugation show strikingly conserved patterns, arguing for a unitary view of conjugation in shaping the genomes of prokaryotes by horizontal gene transfer.
[show abstract][hide abstract] ABSTRACT: In order to get further insights into the role of the clustered, regularly interspaced, short palindromic repeats (CRISPRs) in Escherichia coli, we analyzed the CRISPR diversity in a collection of 290 strains, in the phylogenetic framework of the strains represented by multilocus sequence typing (MLST). The set included 263 natural E. coli isolates exposed to various environments and isolated over a 20-year period from humans and animals, as well as 27 fully sequenced strains. Our analyses confirm that there are two largely independent pairs of CRISPR loci (CRISPR1 and -2 and CRISPR3 and -4), each associated with a different type of cas genes (Ecoli and Ypest, respectively), but that each pair of CRISPRs has similar dynamics. Strikingly, the major phylogenetic group B2 is almost devoid of CRISPRs. The majority of genomes analyzed lack Ypest cas genes and contain CRISPR3 with spacers matching Ypest cas genes. The analysis of relatedness between strains in terms of spacer repertoire and the MLST tree shows a pattern where closely related strains (MLST phylogenetic distance of <0.005 corresponding to at least hundreds of thousands of years) often exhibit identical CRISPRs while more distantly related strains (MLST distance of >0.01) exhibit completely different CRISPRs. This suggests rare but radical turnover of spacers in CRISPRs rather than CRISPR gradual change. We found no link between the presence, size, or content of CRISPRs and the lifestyle of the strains. Our data suggest that, within the E. coli species, CRISPRs do not have the expected characteristics of a classical immune system.
Journal of bacteriology 03/2011; 193(10):2460-7. · 3.94 Impact Factor
[show abstract][hide abstract] ABSTRACT: Changes in effective population size impinge on patterns of molecular evolution. Notably, slightly deleterious mutations are more likely to drift to fixation in smaller populations, which should typically also lead to an overall acceleration in the rates of evolution. This prediction has been validated empirically for several endosymbiont and island taxa. Here, we first show that rate accelerations are also evident in bacterial pathogens whose recent shifts in virulence make them prime candidates for reduced effective population size: Bacillus anthracis, Bordetella parapertussis, Mycobacterium leprae, Salmonella enterica typhi, Shigella spp., and Yersinia pestis. Using closely related genomes to analyze substitution rate dynamics across six phylogenetically independent bacterial clades, we demonstrate that relative rates of coding sequence evolution are biased according to gene functional category. Notably, genes that buffer against slightly deleterious mutations, such as chaperones, experience stronger rate accelerations than other functional classes at both nonsynonymous and synonymous sites. Although theory predicts altered evolutionary dynamics for buffer loci in the face of accumulating deleterious mutations, to observe even stronger rate accelerations is surprising. We suggest that buffer loci experience elevated substitution rates because the accumulation of deleterious mutations in the remainder of the genome favors compensatory substitutions in trans. Critically, the hyper-acceleration is evident across phylogenetically independent clades, supporting the hypothesis that reductions in effective population size predictably induce epistatic responses in genes that buffer against slightly deleterious mutations.
Molecular Biology and Evolution 02/2011; 28(8):2339-49. · 10.35 Impact Factor
[show abstract][hide abstract] ABSTRACT: Gene duplication followed by neo- or sub-functionalization deeply impacts the evolution of protein families and is regarded as the main source of adaptive functional novelty in eukaryotes. While there is ample evidence of adaptive gene duplication in prokaryotes, it is not clear whether duplication outweighs the contribution of horizontal gene transfer in the expansion of protein families. We analyzed closely related prokaryote strains or species with small genomes (Helicobacter, Neisseria, Streptococcus, Sulfolobus), average-sized genomes (Bacillus, Enterobacteriaceae), and large genomes (Pseudomonas, Bradyrhizobiaceae) to untangle the effects of duplication and horizontal transfer. After removing the effects of transposable elements and phages, we show that the vast majority of expansions of protein families are due to transfer, even among large genomes. Transferred genes--xenologs--persist longer in prokaryotic lineages possibly due to a higher/longer adaptive role. On the other hand, duplicated genes--paralogs--are expressed more, and, when persistent, they evolve slower. This suggests that gene transfer and gene duplication have very different roles in shaping the evolution of biological systems: transfer allows the acquisition of new functions and duplication leads to higher gene dosage. Accordingly, we show that paralogs share most protein-protein interactions and genetic regulators, whereas xenologs share very few of them. Prokaryotes invented most of life's biochemical diversity. Therefore, the study of the evolution of biology systems should explicitly account for the predominant role of horizontal gene transfer in the diversification of protein families.