[Show abstract][Hide abstract] ABSTRACT: ABSTRACT Acinetobacter baumannii is a globally important nosocomial pathogen characterized by an increasing incidence of multidrug resistance. Routes of dissemination and gene flow among health care facilities are poorly resolved and are important for understanding the epidemiology of A. baumannii, minimizing disease transmission, and improving patient outcomes. We used whole-genome sequencing to assess diversity and genome dynamics in 49 isolates from one United States hospital system during one year from 2007 to 2008. Core single-nucleotide-variant-based phylogenetic analysis revealed multiple founder strains and multiple independent strains recovered from the same patient yet was insufficient to fully resolve strain relationships, where gene content and insertion sequence patterns added additional discriminatory power. Gene content comparisons illustrated extensive and redundant antibiotic resistance gene carriage and direct evidence of gene transfer, recombination, gene loss, and mutation. Evidence of barriers to gene flow among hospital components was not found, suggesting complex mixing of strains and a large reservoir of A. baumannii strains capable of colonizing patients. IMPORTANCE Genome sequencing was used to characterize multidrug-resistant Acinetobacter baumannii strains from one United States hospital system during a 1-year period to better understand how A. baumannii strains that cause infection are related to one another. Extensive variation in gene content was found, even among strains that were very closely related phylogenetically and epidemiologically. Several mechanisms contributed to this diversity, including transfer of mobile genetic elements, mobilization of insertion sequences, insertion sequence-mediated deletions, and genome-wide homologous recombination. Variation in gene content, however, lacked clear spatial or temporal patterns, suggesting a diverse pool of circulating strains with considerable interaction between strains and hospital locations. Widespread genetic variation among strains from the same hospital and even the same patient, particularly involving antibiotic resistance genes, reinforces the need for molecular diagnostic testing and genomic analysis to determine resistance profiles, rather than a reliance primarily on strain typing and antimicrobial resistance phenotypes for epidemiological studies.
[Show abstract][Hide abstract] ABSTRACT: Toward achieving rapid and large scale genome modification directly in a target organism, we have developed a new genome engineering strategy that uses a combination of bioinformatics aided design, large synthetic DNA and site-specific recombinases. Using Cre recombinase we swapped a target 126-kb segment of the Escherichia coli genome with a 72-kb synthetic DNA cassette, thereby effectively eliminating over 54 kb of genomic DNA from three non-contiguous regions in a single recombination event. We observed complete replacement of the native sequence with the modified synthetic sequence through the action of the Cre recombinase and no competition from homologous recombination. Because of the versatility and high-efficiency of the Cre-lox system, this method can be used in any organism where this system is functional as well as adapted to use with other highly precise genome engineering systems. Compared to present-day iterative approaches in genome engineering, we anticipate this method will greatly speed up the creation of reduced, modularized and optimized genomes through the integration of deletion analyses data, transcriptomics, synthetic biology and site-specific recombination.
Nucleic Acids Research 06/2014; · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Leptospirosis is a globally important, neglected zoonotic infection caused by spirochetes of the genus Leptospira. Since genetic transformation remains technically limited for pathogenic Leptospira, a systems biology pathogenomic approach was used to infer leptospiral virulence genes by whole genome comparison of culture-attenuated Leptospira interrogans serovar Lai with its virulent, isogenic parent. Among the 11 pathogen-specific protein-coding genes in which non-synonymous mutations were found, a putative soluble adenylate cyclase with host cell cAMP-elevating activity, and two members of a previously unstudied ∼15 member paralogous gene family of unknown function were identified. This gene family was also uniquely found in the alpha-proteobacteria Bartonella bacilliformis and Bartonella australis that are geographically restricted to the Andes and Australia, respectively. How the pathogenic Leptospira and these two Bartonella species came to share this expanded gene family remains an evolutionary mystery. In vivo expression analyses demonstrated up-regulation of 10/11 Leptospira genes identified in the attenuation screen, and profound in vivo, tissue-specific up-regulation by members of the paralogous gene family, suggesting a direct role in virulence and host-pathogen interactions. The pathogenomic experimental design here is generalizable as a functional systems biology approach to studying bacterial pathogenesis and virulence and should encourage similar experimental studies of other pathogens.
[Show abstract][Hide abstract] ABSTRACT: Experimental data exists for only a vanishingly small fraction of sequenced microbial genes. This community page discusses the progress made by the COMBREX project to address this important issue using both computational and experimental resources.
[Show abstract][Hide abstract] ABSTRACT: Cell surfaces are decorated by a variety of proteins that facilitate interactions with their environments and support cell stability. These secreted proteins are anchored to the cell by mechanisms that are diverse, and, in archaea, poorly understood. Recently published in silico data suggest that in some species a subset of secreted euryarchaeal proteins, which includes the S-layer glycoprotein, is processed and covalently linked to the cell membrane by enzymes referred to as archaeosortases. In silico work led to the proposal that an independent, sortase-like system for proteolysis-coupled, carboxy-terminal lipid modification exists in bacteria (exosortase) and archaea (archaeosortase). Here, we provide the first in vivo characterization of an archaeosortase in the haloarchaeal model organism Haloferax volcanii. Deletion of the artA gene (HVO_0915) resulted in multiple biological phenotypes: (a) poor growth, especially under low-salt conditions, (b) alterations in cell shape and the S-layer, (c) impaired motility, suppressors of which still exhibit poor growth, and (d) impaired conjugation. We studied one of the ArtA substrates, the S-layer glycoprotein, using detailed proteomic analysis. While the carboxy-terminal region of S-layer glycoproteins, consisting of a threonine-rich O-glycosylated region followed by a hydrophobic transmembrane helix, has been notoriously resistant to any proteomic peptide identification, we were able to identify two overlapping peptides from the transmembrane domain present in the ΔartA strain but not in the wild-type strain. This clearly shows that ArtA is involved in carboxy-terminal posttranslational processing of the S-layer glycoprotein. As it is known from previous studies that a lipid is covalently attached to the carboxy-terminal region of the S-layer glycoprotein, our data strongly support the conclusion that archaeosortase functions analogously to sortase, mediating proteolysis-coupled, covalent cell surface attachment.
[Show abstract][Hide abstract] ABSTRACT: Biological oxidation of methane to methanol by aerobic bacteria is catalysed by two different enzymes, the cytoplasmic or soluble methane monooxygenase (sMMO) and the membrane-bound or particulate methane monooxygenase (pMMO). Expression of MMOs is controlled by a 'copper-switch', i.e. sMMO is only expressed at very low copper : biomass ratios, while pMMO expression increases as this ratio increases. Methanotrophs synthesize a chalkophore, methanobactin, for the binding and import of copper. Previous work suggested that methanobactin was formed from a polypeptide precursor. Here we report that deletion of the gene suspected to encode for this precursor, mbnA, in Methylosinus trichosporium OB3b, abolishes methanobactin production. Further, gene expression assays indicate that methanobactin, together with another polypeptide of previously unknown function, MmoD, play key roles in regulating expression of MMOs. Based on these data, we propose a general model explaining how expression of the MMO operons is regulated by copper, methanobactin and MmoD. The basis of the 'copper-switch' is MmoD, and methanobactin amplifies the magnitude of the switch. Bioinformatic analysis of bacterial genomes indicates that the production of methanobactin-like compounds is not confined to methanotrophs, suggesting that its use as a metal-binding agent and/or role in gene regulation may be widespread in nature.
[Show abstract][Hide abstract] ABSTRACT: Computational prediction of protein function is frequently error-prone and incomplete. In Mycobacterium tuberculosis (Mtb), ∼25% of all genes have no predicted function and are annotated as hypothetical proteins, severely limiting our understanding of Mtb pathogenicity. Here, we utilize a high-throughput quantitative activity-based protein profiling (ABPP) platform to probe, annotate, and validate ATP-binding proteins in Mtb. We experimentally validate prior in silico predictions of >240 proteins and identify 72 hypothetical proteins as ATP binders. ATP interacts with proteins with diverse and unrelated sequences, providing an expanded view of adenosine nucleotide binding in Mtb. Several hypothetical ATP binders are essential or taxonomically limited, suggesting specialized functions in mycobacterial physiology and pathogenicity.
[Show abstract][Hide abstract] ABSTRACT: TIGRFAMs, available online at http://www.jcvi.org/tigrfams is a database of protein family definitions. Each entry features a seed alignment of trusted representative sequences, a hidden Markov model (HMM) built from that alignment, cutoff scores that let automated annotation pipelines decide which proteins are members, and annotations for transfer onto member proteins. Most TIGRFAMs models are designated equivalog, meaning they assign a specific name to proteins conserved in function from a common ancestral sequence. Models describing more functionally heterogeneous families are designated subfamily or domain, and assign less specific but more widely applicable annotations. The Genome Properties database, available at http://www.jcvi.org/genome-properties, specifies how computed evidence, including TIGRFAMs HMM results, should be used to judge whether an enzymatic pathway, a protein complex or another type of molecular subsystem is encoded in a genome. TIGRFAMs and Genome Properties content are developed in concert because subsystems reconstruction for large numbers of genomes guides selection of seed alignment sequences and cutoff values during protein family construction. Both databases specialize heavily in bacterial and archaeal subsystems. At present, 4284 models appear in TIGRFAMs, while 628 systems are described by Genome Properties. Content derives both from subsystem discovery work and from biocuration of the scientific literature.
Nucleic Acids Research 11/2012; · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Covering: 1988 to 2012This review presents recommended nomenclature for the biosynthesis of ribosomally synthesized and post-translationally modified peptides (RiPPs), a rapidly growing class of natural products. The current knowledge regarding the biosynthesis of the >20 distinct compound classes is also reviewed, and commonalities are discussed.
[Show abstract][Hide abstract] ABSTRACT: Biofilms are dense microbial communities. Although widely distributed and medically important, how biofilm cells interact with one another is poorly understood. Recently, we described a novel process whereby myxobacterial biofilm cells exchange their outer membrane (OM) lipoproteins. For the first time we report here the identification of two host proteins, TraAB, required for transfer. These proteins are predicted to localize in the cell envelope; and TraA encodes a distant PA14 lectin-like domain, a cysteine-rich tandem repeat region, and a putative C-terminal protein sorting tag named MYXO-CTERM, while TraB encodes an OmpA-like domain. Importantly, TraAB are required in donors and recipients, suggesting bidirectional transfer. By use of a lipophilic fluorescent dye, we also discovered that OM lipids are exchanged. Similar to lipoproteins, dye transfer requires TraAB function, gliding motility and a structured biofilm. Importantly, OM exchange was found to regulate swarming and development behaviors, suggesting a new role in cell-cell communication. A working model proposes TraA is a cell surface receptor that mediates cell-cell adhesion for OM fusion, in which lipoproteins/lipids are transferred by lateral diffusion. We further hypothesize that cell contact-dependent exchange helps myxobacteria to coordinate their social behaviors.
[Show abstract][Hide abstract] ABSTRACT: As the deluge of genomic DNA sequence grows the fraction of protein sequences that have been manually curated falls. In turn, as the number of laboratories with the ability to sequence genomes in a high-throughput manner grows, the informatics capability of those labs to accurately identify and annotate all genes within a genome may often be lacking. These issues have led to fears about transitive annotation errors making sequence databases less reliable. During the lifetime of the Pfam protein families database a number of protein families have been built, which were later identified as composed solely of spurious open reading frames (ORFs) either on the opposite strand or in a different, overlapping reading frame with respect to the true protein-coding or non-coding RNA gene. These families were deleted and are no longer available in Pfam. However, we realized that these may perform a useful function to identify new spurious ORFs. We have collected these families together in AntiFam along with additional custom-made families of spurious ORFs. This resource currently contains 23 families that identified 1310 spurious proteins in UniProtKB and a further 4119 spurious proteins in a collection of metagenomic sequences. UniProt has adopted AntiFam as a part of the UniProtKB quality control process and will investigate these spurious proteins for exclusion.
Database The Journal of Biological Databases and Curation 01/2012; 2012:bas003. · 4.46 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Multiple new prokaryotic C-terminal protein-sorting signals were found that reprise the tripartite architecture shared by LPXTG and PEP-CTERM: motif, TM helix, basic cluster. Defining hidden Markov models were constructed for all. PGF-CTERM occurs in 29 archaeal species, some of which have more than 50 proteins that share the domain. PGF-CTERM proteins include the major cell surface protein in Halobacterium, a glycoprotein with a partially characterized diphytanylglyceryl phosphate linkage near its C terminus. Comparative genomics identifies a distant exosortase homolog, designated archaeosortase A (ArtA), as the likely protein-processing enzyme for PGF-CTERM. Proteomics suggests that the PGF-CTERM region is removed. Additional systems include VPXXXP-CTERM/archeaosortase B in two of the same archaea and PEF-CTERM/archaeosortase C in four others. Bacterial exosortases often fall into subfamilies that partner with very different cohorts of extracellular polymeric substance biosynthesis proteins; several species have multiple systems. Variant systems include the VPDSG-CTERM/exosortase C system unique to certain members of the phylum Verrucomicrobia, VPLPA-CTERM/exosortase D in several alpha- and deltaproteobacterial species, and a dedicated (single-target) VPEID-CTERM/exosortase E system in alphaproteobacteria. Exosortase-related families XrtF in the class Flavobacteria and XrtG in Gram-positive bacteria mark distinctive conserved gene neighborhoods. A picture emerges of an ancient and now well-differentiated superfamily of deeply membrane-embedded protein-processing enzymes. Their target proteins are destined to transit cellular membranes during their biosynthesis, during which most undergo additional posttranslational modifications such as glycosylation.
Journal of bacteriology 01/2012; 194(1):36-48. · 2.69 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The rhomboid family of serine proteases occurs in all domains of life. Its members contain at least six hydrophobic membrane-spanning helices, with an active site serine located deep within the hydrophobic interior of the plasma membrane. The model member GlpG from Escherichia coli is heavily studied through engineered mutant forms, varied model substrates, and multiple X-ray crystal studies, yet its relationship to endogenous substrates is not well understood. Here we describe an apparent membrane anchoring C-terminal homology domain that appears in numerous genera including Shewanella, Vibrio, Acinetobacter, and Ralstonia, but excluding Escherichia and Haemophilus. Individual genomes encode up to thirteen members, usually homologous to each other only in this C-terminal region. The domain's tripartite architecture consists of motif, transmembrane helix, and cluster of basic residues at the protein C-terminus, as also seen with the LPXTG recognition sequence for sortase A and the PEP-CTERM recognition sequence for exosortase. Partial Phylogenetic Profiling identifies a distinctive rhomboid-like protease subfamily almost perfectly co-distributed with this recognition sequence. This protease subfamily and its putative target domain are hereby renamed rhombosortase and GlyGly-CTERM, respectively. The protease and target are encoded by consecutive genes in most genomes with just a single target, but far apart otherwise. The signature motif of the Rhombo-CTERM domain, often SGGS, only partially resembles known cleavage sites of rhomboid protease family model substrates. Some protein families that have several members with C-terminal GlyGly-CTERM domains also have additional members with LPXTG or PEP-CTERM domains instead, suggesting there may be common themes to the post-translational processing of these proteins by three different membrane protein superfamilies.
PLoS ONE 12/2011; 6(12):e28886. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: CharProtDB (http://www.jcvi.org/charprotdb/) is a curated database of biochemically characterized proteins. It provides a source of direct rather than transitive assignments of function, designed to support automated annotation pipelines. The initial data set in CharProtDB was collected through manual literature curation over the years by analysts at the J. Craig Venter Institute (JCVI) [formerly The Institute of Genomic Research (TIGR)] as part of their prokaryotic genome sequencing projects. The CharProtDB has been expanded by import of selected records from publicly available protein collections whose biocuration indicated direct rather than homology-based assignment of function. Annotations in CharProtDB include gene name, symbol and various controlled vocabulary terms, including Gene Ontology terms, Enzyme Commission number and TransportDB accession. Each annotation is referenced with the source; ideally a journal reference, or, if imported and lacking one, the original database source.
Nucleic Acids Research 12/2011; 40(Database issue):D237-41. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.
Nucleic Acids Research 11/2011; 40(Database issue):D306-12. · 8.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Phylogenetic profiling is a technique of scoring co-occurrence between a protein family and some other trait, usually another protein family, across a set of taxonomic groups. In spite of several refinements in recent years, the technique still invites significant improvement. To be its most effective, a phylogenetic profiling algorithm must be able to examine co-occurrences among protein families whose boundaries are uncertain within large homologous protein superfamilies.
Partial Phylogenetic Profiling (PPP) is an iterative algorithm that scores a given taxonomic profile against the taxonomic distribution of families for all proteins in a genome. The method works through optimizing the boundary of each protein family, rather than by relying on prebuilt protein families or fixed sequence similarity thresholds. Double Partial Phylogenetic Profiling (DPPP) is a related procedure that begins with a single sequence and searches for optimal granularities for its surrounding protein family in order to generate the best query profiles for PPP. We present ProPhylo, a high-performance software package for phylogenetic profiling studies through creating individually optimized protein family boundaries. ProPhylo provides precomputed databases for immediate use and tools for manipulating the taxonomic profiles used as queries.
ProPhylo results show universal markers of methanogenesis, a new DNA phosphorothioation-dependent restriction enzyme, and efficacy in guiding protein family construction. The software and the associated databases are freely available under the open source Perl Artistic License from ftp://ftp.jcvi.org/pub/data/ppp/.
[Show abstract][Hide abstract] ABSTRACT: The CRISPR-Cas (clustered regularly interspaced short palindromic repeats-CRISPR-associated proteins) modules are adaptive immunity systems that are present in many archaea and bacteria. These defence systems are encoded by operons that have an extraordinarily diverse architecture and a high rate of evolution for both the cas genes and the unique spacer content. Here, we provide an updated analysis of the evolutionary relationships between CRISPR-Cas systems and Cas proteins. Three major types of CRISPR-Cas system are delineated, with a further division into several subtypes and a few chimeric variants. Given the complexity of the genomic architectures and the extremely dynamic evolution of the CRISPR-Cas systems, a unified classification of these systems should be based on multiple criteria. Accordingly, we propose a 'polythetic' classification that integrates the phylogenies of the most common cas genes, the sequence and organization of the CRISPR repeats and the architecture of the CRISPR-cas loci.
[Show abstract][Hide abstract] ABSTRACT: Data mining methods in bioinformatics and comparative genomics commonly rely on working definitions of protein families from prior computation. Partial phylogenetic profiling (PPP), by contrast, optimizes family sizes during its searches for the cooccurring protein families that serve different roles in the same biological system. In a large-scale investigation of the incredibly diverse radical S-adenosylmethionine (SAM) enzyme superfamily, PPP aided in building a collection of 68 TIGRFAMs hidden Markov models (HMMs) that define nonoverlapping and functionally distinct subfamilies. Many identify radical SAM enzymes as molecular markers for multicomponent biological systems; HMMs defining their partner proteins also were constructed. Newly found systems include five groupings of protein families in which at least one marker is a radical SAM enzyme while another, encoded by an adjacent gene, is a short peptide predicted to be its substrate for posttranslational modification. The most prevalent, in over 125 genomes, featuring a peptide that we designate SCIFF (six cysteines in forty-five residues), is conserved throughout the class Clostridia, a distribution inconsistent with putative bacteriocin activity. A second novel system features a tandem pair of putative peptide-modifying radical SAM enzymes associated with a highly divergent family of peptides in which the only clearly conserved feature is a run of His-Xaa-Ser repeats. A third system pairs a radical SAM domain peptide maturase with selenocysteine-containing targets, suggesting a new biological role for selenium. These and several additional novel maturases that cooccur with predicted target peptides share a C-terminal additional 4Fe4S-binding domain with PqqE, the subtilosin A maturase AlbA, and the predicted mycofactocin and Nif11-class peptide maturases as well as with activators of anaerobic sulfatases and quinohemoprotein amine dehydrogenases. Radical SAM enzymes with this additional domain, as detected by TIGR04085, significantly outnumber lantibiotic synthases and cyclodehydratases combined in reference genomes while being highly enriched for members whose apparent targets are small peptides. Interpretation of comparative genomics evidence suggests unexpected (nonbacteriocin) roles for natural products from several of these systems.
Journal of bacteriology 06/2011; 193(11):2745-55. · 2.69 Impact Factor