[Show abstract][Hide abstract] ABSTRACT: The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.
[Show abstract][Hide abstract] ABSTRACT: Motivation:
We've developed a highly curated bacterial virulence factor (VF) library in PATRIC (Pathosystems Resource Integration Center, www.patricbrc.org) to support infectious disease research. Although several VF databases are available, there is still a need to incorporate new knowledge found in published experimental evidence and integrate these data with other information known for these specific VF genes, including genomic and other omics data. This integration supports the identification of VFs, comparative studies and hypothesis generation, which facilitates the understanding of virulence and pathogenicity.
We have manually curated VFs from six prioritized NIAID (National Institute of Allergy and Infectious Diseases) category A-C bacterial pathogen genera, Mycobacterium, Salmonella, Escherichia, Shigella, Listeria and Bartonella, using published literature. This curated information on virulence has been integrated with data from genomic functional annotations, trancriptomic experiments, protein-protein interactions and disease information already present in PATRIC. Such integration gives researchers access to a broad array of information about these individual genes, and also to a suite of tools to perform comparative genomic and transcriptomics analysis that are available at PATRIC.
Availability and implementation:
All tools and data are freely available at PATRIC (http://patricbrc.org).
Supplementary data are available at Bioinformatics online.
[Show abstract][Hide abstract] ABSTRACT: The Pathosystems Resource Integration Center (PATRIC) is a genomics-centric relational database and bioinformatics resource designed to assist scientists in infectious-disease research. This method paper provides detailed instructions on using this resource to finding data specific to genomes, saving it in a personalized workspace and using a variety of interactive tools to analyze that data. While PATRIC contains many diverse tools and functionalities to explore both genome-scale and gene expression data, the main focus of this chapter is on comparative analysis of bacterial genomes.
[Show abstract][Hide abstract] ABSTRACT: For many scientific applications, it is highly desirable to be able to compare metabolic models of closely related genomes. In this short report, we attempt to raise awareness to the fact that taking annotated genomes from public repositories and using them for metabolic model reconstructions is far from being trivial due to annotation inconsistencies. We are proposing a protocol for comparative analysis of metabolic models on closely related genomes, using fifteen strains of genus Brucella, which contains pathogens of both humans and livestock. This study lead to the identification and subsequent correction of inconsistent annotations in the SEED database, as well as the identification of 31 biochemical reactions that are common to Brucella, which are not originally identified by automated metabolic reconstructions. We are currently implementing this protocol for improving automated annotations within the SEED database and these improvements have been propagated into PATRIC, Model-SEED, KBase and RAST. This method is an enabling step for the future creation of consistent annotation systems and high-quality model reconstructions that will support in predicting accurate phenotypes such as pathogenicity, media requirements or type of respiration.
Electronic supplementary material
The online version of this article (doi:10.1007/s13205-014-0202-4) contains supplementary material, which is available to authorized users.
[Show abstract][Hide abstract] ABSTRACT: Brucella species include important zoonotic pathogens that have a substantial impact on both agriculture and human health throughout the world. Brucella are thought of as 'stealth pathogens' that escape recognition by the host innate immune response, modulate the acquired immune response and evade intracellular destruction. We analyzed the genome sequences of members of the family Brucellaceae to assess its evolutionary history from likely free-living soil based progenitors into highly successful intracellular pathogens. Phylogenetic analysis split the genus into two groups: recently identified and early dividing 'atypical' strains, and a highly conserved 'classical' core clade containing the major pathogenic species. Lateral gene transfer events brought unique genomic regions into Brucella that differentiated them from Ochrobactrum and allowed the stepwise acquisition of virulence factors that include a type IV secretion system, a perosamine-based O-antigen and systems for sequestering metal ions absent in progenitors. Subsequent radiation within the core Brucella resulted in lineages that appear to have evolved within their preferred mammalian hosts, restricting their virulence to become stealth pathogens capable of causing long-term chronic infections.
Journal of bacteriology 12/2013; 196(5). DOI:10.1128/JB.01091-13 · 2.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In 2004, the SEED (http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine (http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.
[Show abstract][Hide abstract] ABSTRACT: The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10 000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.
Nucleic Acids Research 11/2013; 42(D1). DOI:10.1093/nar/gkt1099 · 9.11 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Access to online repositories for genomic and associated "-omics" datasets is now an essential part of everyday research activity. It is important therefore that the Tuberculosis community is aware of the databases and tools available to them online, as well as for the database hosts to know what the needs of the research community are. One of the goals of the Tuberculosis Annotation Jamboree, held in Washington DC on March 7th-8th 2012, was therefore to provide an overview of the current status of three key Tuberculosis resources, TubercuList (tuberculist.epfl.ch), TB Database (www.tbdb.org), and Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org). Here we summarize some key updates and upcoming features in TubercuList, and provide an overview of the PATRIC site and its online tools for pathogen RNA-Seq analysis.
[Show abstract][Hide abstract] ABSTRACT: Informatics-driven approaches change how research and development are conducted, who participates, and enables systems-oriented views of science and research. Most life sciences researchers have a very strong desire for the full integration of data and analysis tools delivered through a single interface. Infectious disease (ID) research and development provides a uniquely challenging and high impact opportunity. The biological complexity of infectious disease systems, which are composed of multiple scales of interactions between potential pathogens, hosts, vectors, and the environment, challenges information resources because of the breadth of organism-organism and organism-environment interactions. Applications of integrated data for ID serves a variety of constituencies, such as clinicians, diagnostician, drug and vaccine developers, and epidemiologists. Thus there is a complexity that makes ID an opportune area in which to develop, deploy and use CyberInfrastructure.
Biomedical Engineering Systems and Technologies., Fred A., J. Filipe, H. Gamboa, eds. Biomedical Engineering Systems and Technologies. Springer Berlin Heidelberg;; 01/2013
[Show abstract][Hide abstract] ABSTRACT: Hundreds of putative enzymes from Mycobacterium tuberculosis as well as other mycobacteria remain categorized as "conserved hypothetical proteins" or "hypothetical proteins", offering little or no information on their functional role in pathogenic and non-pathogenic processes. In this study we have predicted the fold and 3-D structure of more than 99% of all proteins encoded in the genome of M. tuberculosis H37Rv. Fold-recognition, database search, 3-D modelling was performed using Protein Homology/analogy Recognition Engine V 2.0 (Phyre(2)). These results are used to tentatively assign potential function for unannotated enzymes and proteins. In summary, fold-recognition and structural homology might be used as a complementary tool in genome annotation efforts and furthermore, it can deliver primary sequence-independent information regarding structure, ligands and even substrate specificity for enzymes that display low primary sequence identity with potential homologues in other species.
[Show abstract][Hide abstract] ABSTRACT: Unlabelled:
Brucella species are Gram-negative bacteria that infect mammals. Recently, two unusual strains (Brucella inopinata BO1T and B. inopinata-like BO2) have been isolated from human patients, and their similarity to some atypical brucellae isolated from Australian native rodent species was noted. Here we present a phylogenomic analysis of the draft genome sequences of BO1T and BO2 and of the Australian rodent strains 83-13 and NF2653 that shows that they form two groups well separated from the other sequenced Brucella spp. Several important differences were noted. Both BO1T and BO2 did not agglutinate significantly when live or inactivated cells were exposed to monospecific A and M antisera against O-side chain sugars composed of N-formyl-perosamine. While BO1T maintained the genes required to synthesize a typical Brucella O-antigen, BO2 lacked many of these genes but still produced a smooth LPS (lipopolysaccharide). Most missing genes were found in the wbk region involved in O-antigen synthesis in classic smooth Brucella spp. In their place, BO2 carries four genes that other bacteria use for making a rhamnose-based O-antigen. Electrophoretic, immunoblot, and chemical analyses showed that BO2 carries an antigenically different O-antigen made of repeating hexose-rich oligosaccharide units that made the LPS water-soluble, which contrasts with the homopolymeric O-antigen of other smooth brucellae that have a phenol-soluble LPS. The results demonstrate the existence of a group of early-diverging brucellae with traits that depart significantly from those of the Brucella species described thus far.
This report examines differences between genomes from four new Brucella strains and those from the classic Brucella spp. Our results show that the four new strains are outliers with respect to the previously known Brucella strains and yet are part of the genus, forming two new clades. The analysis revealed important information about the evolution and survival mechanisms of Brucella species, helping reshape our knowledge of this important zoonotic pathogen. One discovery of special importance is that one of the strains, BO2, produces an O-antigen distinct from any that has been seen in any other Brucella isolates to date.
[Show abstract][Hide abstract] ABSTRACT: We present the draft genome for the Rickettsia endosymbiont of Ixodes scapularis (REIS), a symbiont of the deer tick vector of Lyme disease in North America. Among Rickettsia species (Alphaproteobacteria: Rickettsiales), REIS has the largest genome sequenced to date (>2 Mb) and contains 2,309 genes across the chromosome and four plasmids
(pREIS1 to pREIS4). The most remarkable finding within the REIS genome is the extraordinary proliferation of mobile genetic
elements (MGEs), which contributes to a limited synteny with other Rickettsia genomes. In particular, an integrative conjugative element named RAGE (for Rickettsiales amplified genetic element), previously identified in scrub typhus rickettsiae (Orientia tsutsugamushi) genomes, is present on both the REIS chromosome and plasmids. Unlike the pseudogene-laden RAGEs of O. tsutsugamushi, REIS encodes nine conserved RAGEs that include F-like type IV secretion systems similar to that of the tra genes encoded in the Rickettsia bellii and R. massiliae genomes. An unparalleled abundance of encoded transposases (>650) relative to genome size, together with the RAGEs and other
MGEs, comprise ∼35% of the total genome, making REIS one of the most plastic and repetitive bacterial genomes sequenced to
date. We present evidence that conserved rickettsial genes associated with an intracellular lifestyle were acquired via MGEs,
especially the RAGE, through a continuum of genomic invasions. Robust phylogeny estimation suggests REIS is ancestral to the
virulent spotted fever group of rickettsiae. As REIS is not known to invade vertebrate cells and has no known pathogenic effects
on I. scapularis, its genome sequence provides insight on the origin of mechanisms of rickettsial pathogenicity.
Journal of bacteriology 11/2011; 194(2):376-94. DOI:10.1128/JB.06244-11 · 2.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Funded by the National Institute of Allergy and Infectious Diseases, the Pathosystems Resource Integration Center (PATRIC) is a genomics-centric relational database and bioinformatics resource designed to assist scientists in infectious-disease research. Specifically, PATRIC provides scientists with (i) a comprehensive bacterial genomics database, (ii) a plethora of associated data relevant to genomic analysis, and (iii) an extensive suite of computational tools and platforms for bioinformatics analysis. While the primary aim of PATRIC is to advance the knowledge underlying the biology of human pathogens, all publicly available genome-scale data for bacteria are compiled and continually updated, thereby enabling comparative analyses to reveal the basis for differences between infectious free-living and commensal species. Herein we summarize the major features available at PATRIC, dividing the resources into two major categories: (i) organisms, genomes, and comparative genomics and (ii) recurrent integration of community-derived associated data. Additionally, we present two experimental designs typical of bacterial genomics research and report on the execution of both projects using only PATRIC data and tools. These applications encompass a broad range of the data and analysis tools available, illustrating practical uses of PATRIC for the biologist. Finally, a summary of PATRIC's outreach activities, collaborative endeavors, and future research directions is provided.
Infection and immunity 09/2011; 79(11):4286-98. DOI:10.1128/IAI.00207-11 · 3.73 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Infectious disease research is generating an increasing amount of disparate data on pathogenic systems. There is a growing need for resources that effectively integrate, analyze, deliver and visualize these data, both to improve our understanding of infectious diseases and to facilitate the development of strategies for disease control and prevention.
We have developed Disease View, an online host-pathogen resource that enables infectious disease-centric access, analysis and visualization of host-pathogen interactions. In this resource, we associate infectious diseases with corresponding pathogens, provide information on pathogens, pathogen virulence genes and the genetic and chemical evidences for the human genes that are associated with the diseases. We also deliver the relationships between pathogens, genes and diseases in an interactive graph and provide the geolocation reports of associated diseases around the globe in real time. Unlike many other resources, we have applied an iterative, user-centered design process to the entire resource development, including data acquisition, analysis and visualization.
Freely available at http://www.patricbrc.org; all major web browsers supported.
Supplementary data are available at Bioinformatics online.
[Show abstract][Hide abstract] ABSTRACT: Helicobacter pylori is the dominant member of the gastric microbiota and has been associated with an increased risk of gastric cancer and peptic ulcers in adults. H. pylori populations have migrated and diverged with human populations, and health effects vary. Here, we describe the whole genome of the cag-positive strain V225d, cultured from a Venezuelan Piaroa Amerindian subject. To gain insight into the evolution and host adaptation of this bacterium, we undertook comparative H. pylori genomic analyses. A robust multiprotein phylogenetic tree reflects the major human migration out of Africa, across Europe, through Asia, and into the New World, placing Amerindian H. pylori as a particularly close sister group to East Asian H. pylori. In contrast, phylogenetic analysis of the host-interactive genes vacA and cagA shows substantial divergence of Amerindian from Old World forms and indicates new genotypes (e.g., VacA m3) involving these loci. Despite deletions in CagA EPIYA and CRPIA domains, V225d stimulates interleukin-8 secretion and the hummingbird phenotype in AGS cells. However, following a 33-week passage in the mouse stomach, these phenotypes were lost in isolate V225-RE, which had a 15-kb deletion in the cag pathogenicity island that truncated CagA and eliminated some of the type IV secretion system genes. Thus, the unusual V225d cag architecture was fully functional via conserved elements, but the natural deletion of 13 cag pathogenicity island genes and the truncation of CagA impaired the ability to induce inflammation.
Journal of bacteriology 06/2010; 192(12):3078-92. DOI:10.1128/JB.00063-10 · 2.81 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Systems-biology and infectious-disease (host-pathogen-environment) research and development is becoming increasingly dependent on integrating data from diverse and dynamic sources. Maintaining integrated resources over long periods of time presents distinct challenges. This review describes experiences and lessons learned from integrating data in two five-year projects focused on pathosystems biology: the Pathosystems Resource Integration Center (PATRIC, http://patric.vbi.vt.edu/), with a goal of developing bioinformatics resources for the research and countermeasures-development communities based on genomics data, and the Resource Center for Biodefense Proteomics Research (RCBPR, http://www.proteomicsresource.org/), with a goal of developing resources based on the experiment data such as microarray and proteomics data from diverse sources and technologies. Some challenges include integrating genomic sequence and experiment data, data synchronization, data quality control, and usability engineering. We present examples of a variety of data-integration problems drawn from our experiences with PATRIC and RBPRC, as well as open research questions related to long-term sustainability, and describe the next steps to meeting these challenges. Novel contributions of this work include 1) an approach for addressing discrepancies between experiment results and interpreted results, and 2) expanding the range of data-integration techniques to include usability engineering at the presentation level.
[Show abstract][Hide abstract] ABSTRACT: The facultative intracellular bacterial pathogen Brucella infects a wide range of warm-blooded land and marine vertebrates and causes brucellosis. Currently, there are nine recognized Brucella species based on host preferences and phenotypic differences. The availability of 10 different genomes consisting of two chromosomes and representing six of the species allowed for a detailed comparison among themselves and relatives in the order Rhizobiales. Phylogenomic analysis of ortholog families shows limited divergence but distinct radiations, producing four clades as follows: Brucella abortus-Brucella melitensis, Brucella suis-Brucella canis, Brucella ovis, and Brucella ceti. In addition, Brucella phylogeny does not appear to reflect the phylogeny of Brucella species' preferred hosts. About 4.6% of protein-coding genes seem to be pseudogenes, which is a relatively large fraction. Only B. suis 1330 appears to have an intact beta-ketoadipate pathway, responsible for utilization of plant-derived compounds. In contrast, this pathway in the other species is highly pseudogenized and consistent with the "domino theory" of gene death. There are distinct shared anomalous regions (SARs) found in both chromosomes as the result of horizontal gene transfer unique to Brucella and not shared with its closest relative Ochrobactrum, a soil bacterium, suggesting their acquisition occurred in spite of a predominantly intracellular lifestyle. In particular, SAR 2-5 appears to have been acquired by Brucella after it became intracellular. The SARs contain many genes, including those involved in O-polysaccharide synthesis and type IV secretion, which if mutated or absent significantly affect the ability of Brucella to survive intracellularly in the infected host.
Journal of bacteriology 05/2009; 191(11):3569-79. DOI:10.1128/JB.01767-08 · 2.81 Impact Factor