[Show abstract][Hide abstract] ABSTRACT: The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is a repository for the submission, maintenance and presentation of nucleotide sequence data and related sample and experimental
information. In this article we report on ENA in 2015 regarding general activity, notable published data sets and major achievements.
This is followed by a focus on sustainable biocuration of functional annotation, an area which has particularly felt the pressure
of sequencing growth. The importance of functional annotation, how it can be submitted and the shifting role of the biocurator
in the context of increasing volumes of data are all discussed.
Preview · Article · Nov 2015 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: Plasmid-mediated quinolone resistance (PMQR) refers to a family of closely related genes that confer decreased susceptibility to fluoroquinolones. The PMQR genes are generally associated with integrons and/or plasmids that carry additional antimicrobial resistance genes active against a range of antimicrobials. In Ho Chi Minh City (HCMC) we have previously shown a high frequency of PMQR genes within commensal Enterobacteriaceae. However, there is limited available sequence data detailing the genetic context in which the PMQR genes reside, and a lack of understanding of how these genes spread across the Enterobacteriaceae. Here, we aimed to determine the genetic background facilitating the spread and maintenance of qnrS1, the dominant PMQR gene circulating in HCMC. We sequenced three qnrS1-carrying plasmids in their entirety to understand the genetic context of these qnrS1-embedded plasmids and also the association of qnrS1 mediated quinolone resistance with other antimicrobial resistance phenotypes. Annotation of the three qnrS1-containing plasmids revealed a qnrS1 containing transposon with a closely related structure. We screened 112 qnrS1 positive commensal Enterobacteriaceae isolated in the community and a hospital in HCMC to detect the common transposon structure. We found the same transposon structure to be present in 71.4% (45/63) of qnrS1 positive hospital isolates and in 36.7% (18/49) of qnrS1 positive isolates from the community. The resulting sequence analysis of the qnrS1 environment suggests that qnrS1 are widely distributed and are mobilised on elements with a common genetic background. Our data adds additional insight into mechanisms that facilitate resistance to multiple antimicrobials in Gram-negative bacteria in Vietnam.
Preview · Article · Jun 2015 · Journal of Medical Microbiology
[Show abstract][Hide abstract] ABSTRACT: The complete genomes of two virulent phages infecting Citrobacter rodentium are reported here for the first time. Both bacteriophages were isolated from local sewage treatment plant effluents. Genome
analyses revealed a close relationship between both phages and allowed their classification as members of the Autographivirinae subfamily in the T7-like genus.
Full-text · Article · May 2014 · Genome Announcements
[Show abstract][Hide abstract] ABSTRACT: The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is a repository for the world public domain nucleotide sequence data output. ENA content covers a spectrum of data types
including raw reads, assembly data and functional annotation. ENA has faced a dramatic growth in genome assembly submission
rates, data volumes and complexity of datasets. This has prompted a broad reworking of assembly submission services, for which
we now reach the end of a major programme of work and many enhancements have already been made available over the year to
components of the submission service. In this article, we briefly review ENA content and growth over 2013, describe our rapidly
developing services for genome assembly information and outline further major developments over the last year.
Full-text · Article · Nov 2013 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/) collects, maintains and presents comprehensive nucleic acid sequence and related information as part of the permanent public
scientific record. Here, we provide brief updates on ENA content developments and major service enhancements in 2012 and describe
in more detail two important areas of development and policy that are driven by ongoing growth in sequencing technologies.
First, we describe the ENA data warehouse, a resource for which we provide a programmatic entry point to integrated content
across the breadth of ENA. Second, we detail our plans for the deployment of CRAM data compression technology in ENA.
Full-text · Article · Nov 2012 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena), Europe's primary nucleotide sequence resource, captures and presents globally comprehensive nucleic acid sequence and associated
information. Covering the spectrum from raw data to assembled and functionally annotated genomes, the ENA has witnessed a
dramatic growth resulting from advances in sequencing technology and ever broadening application of the methodology. During
2011, we have continued to operate and extend the broad range of ENA services. In particular, we have released major new functionality
in our interactive web submission system, Webin, through developments in template-based submissions for annotated sequences
and support for raw next-generation sequence read submissions.
Full-text · Article · Nov 2011 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: Comparison of the complete genome sequence of Bacteroides fragilis 638R, originally isolated in the USA, was made with two previously sequenced strains isolated in the UK (NCTC 9343) and Japan (YCH46). The presence of 10 loci containing genes associated with polysaccharide (PS) biosynthesis, each including a putative Wzx flippase and Wzy polymerase, was confirmed in all three strains, despite a lack of cross-reactivity between NCTC 9343 and 638R surface PS-specific antibodies by immunolabelling and microscopy. Genomic comparisons revealed an exceptional level of PS biosynthesis locus diversity. Of the 10 divergent PS-associated loci apparent in each strain, none is similar between NCTC 9343 and 638R. YCH46 shares one locus with NCTC 9343, confirmed by mAb labelling, and a second different locus with 638R, making a total of 28 divergent PS biosynthesis loci amongst the three strains. The lack of expression of the phase-variable large capsule (LC) in strain 638R, observed in NCTC 9343, is likely to be due to a point mutation that generates a stop codon within a putative initiating glycosyltransferase, necessary for the expression of the LC in NCTC 9343. Other major sequence differences were observed to arise from different numbers and variety of inserted extra-chromosomal elements, in particular prophages. Extensive horizontal gene transfer has occurred within these strains, despite the presence of a significant number of divergent DNA restriction and modification systems that act to prevent acquisition of foreign DNA. The level of amongst-strain diversity in PS biosynthesis loci is unprecedented.
[Show abstract][Hide abstract] ABSTRACT: The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena) is Europe’s primary nucleotide-sequence repository. The ENA consists of three main databases: the Sequence Read Archive
(SRA), the Trace Archive and EMBL-Bank. The objective of ENA is to support and promote the use of nucleotide sequencing as
an experimental research platform by providing data submission, archive, search and download services. In this article, we
outline these services and describe major changes and improvements introduced during 2010. These include extended EMBL-Bank
and SRA-data submission services, extended ENA Browser functionality, support for submitting data to the European Genome-phenome
Archive (EGA) through SRA, and the launch of a new sequence similarity search service.
Full-text · Article · Oct 2010 · Nucleic Acids Research
[Show abstract][Hide abstract] ABSTRACT: Plasmid mediated antimicrobial resistance in the Enterobacteriaceae is a global problem. The rise of CTX-M class extended spectrum beta lactamases (ESBLs) has been well documented in industrialized countries. Vietnam is representative of a typical transitional middle income country where the spectrum of infectious diseases combined with the spread of drug resistance is shifting and bringing new healthcare challenges.
We collected hospital admission data from the pediatric population attending the hospital for tropical diseases in Ho Chi Minh City with Shigella infections. Organisms were cultured from all enrolled patients and subjected to antimicrobial susceptibility testing. Those that were ESBL positive were subjected to further investigation. These investigations included PCR amplification for common ESBL genes, plasmid investigation, conjugation, microarray hybridization and DNA sequencing of a bla(CTX-M) encoding plasmid.
We show that two different bla(CTX-M) genes are circulating in this bacterial population in this location. Sequence of one of the ESBL plasmids shows that rather than the gene being integrated into a preexisting MDR plasmid, the bla(CTX-M) gene is located on relatively simple conjugative plasmid. The sequenced plasmid (pEG356) carried the bla(CTX-M-24) gene on an ISEcp1 element and demonstrated considerable sequence homology with other IncFI plasmids.
The rapid dissemination, spread of antimicrobial resistance and changing population of Shigella spp. concurrent with economic growth are pertinent to many other countries undergoing similar development. Third generation cephalosporins are commonly used empiric antibiotics in Ho Chi Minh City. We recommend that these agents should not be considered for therapy of dysentery in this setting.
[Show abstract][Hide abstract] ABSTRACT: Citrobacter rodentium (formally Citrobacter freundii biotype 4280) is a highly infectious pathogen that causes colitis and transmissible colonic hyperplasia in mice. In common
with enteropathogenic and enterohemorrhagic Escherichia coli (EPEC and EHEC, respectively), C. rodentium exploits a type III secretion system (T3SS) to induce attaching and effacing (A/E) lesions that are essential for virulence.
Here, we report the fully annotated genome sequence of the 5.3-Mb chromosome and four plasmids harbored by C. rodentium strain ICC168. The genome sequence revealed key information about the phylogeny of C. rodentium and identified 1,585 C. rodentium-specific (without orthologues in EPEC or EHEC) coding sequences, 10 prophage-like regions, and 17 genomic islands, including
the locus for enterocyte effacement (LEE) region, which encodes a T3SS and effector proteins. Among the 29 T3SS effectors
found in C. rodentium are all 22 of the core effectors of EPEC strain E2348/69. In addition, we identified a novel C. rodentium effector, named EspS. C. rodentium harbors two type VI secretion systems (T6SS) (CTS1 and CTS2), while EHEC contains only one T6SS (EHS). Our analysis suggests
that C. rodentium and EPEC/EHEC have converged on a common host infection strategy through access to a common pool of mobile DNA and that C. rodentium has lost gene functions associated with a previous pathogenic niche.
Full-text · Article · Nov 2009 · Journal of bacteriology
[Show abstract][Hide abstract] ABSTRACT: Pseudomonas fluorescens are common soil bacteria that can improve plant health through nutrient cycling, pathogen antagonism and induction of plant defenses. The genome sequences of strains SBW25 and Pf0-1 were determined and compared to each other and with P. fluorescens Pf-5. A functional genomic in vivo expression technology (IVET) screen provided insight into genes used by P. fluorescens in its natural environment and an improved understanding of the ecological significance of diversity within this species.
Comparisons of three P. fluorescens genomes (SBW25, Pf0-1, Pf-5) revealed considerable divergence: 61% of genes are shared, the majority located near the replication origin. Phylogenetic and average amino acid identity analyses showed a low overall relationship. A functional screen of SBW25 defined 125 plant-induced genes including a range of functions specific to the plant environment. Orthologues of 83 of these exist in Pf0-1 and Pf-5, with 73 shared by both strains. The P. fluorescens genomes carry numerous complex repetitive DNA sequences, some resembling Miniature Inverted-repeat Transposable Elements (MITEs). In SBW25, repeat density and distribution revealed 'repeat deserts' lacking repeats, covering approximately 40% of the genome.
P. fluorescens genomes are highly diverse. Strain-specific regions around the replication terminus suggest genome compartmentalization. The genomic heterogeneity among the three strains is reminiscent of a species complex rather than a single species. That 42% of plant-inducible genes were not shared by all strains reinforces this conclusion and shows that ecological success requires specialized and core functions. The diversity also indicates the significant size of genetic information within the Pseudomonas pan genome.
[Show abstract][Hide abstract] ABSTRACT: One of the most satisfying aspects of a genome sequencing project is the identification of the genes contained within it.These are of two types: those which encode tRNAs and those which produce proteins. After a general introduction on the properties of protein-encoding genes and the utility of the Basic Local Alignment Search Tool (BLASTX) to identify genes through homologs, a variety of tools are discussed by their creators. These include for genome annotation: GeneMark, Artemis, and BASys; and, for genome comparisons: Artemis Comparison Tool (ACT), Mauve, CoreGenes, and GeneOrder.
Full-text · Article · Feb 2009 · Methods in Molecular Biology
[Show abstract][Hide abstract] ABSTRACT: Bacterial infections of the lungs of cystic fibrosis (CF) patients cause major complications in the treatment of this common
genetic disease. Burkholderia cenocepacia infection is particularly problematic since this organism has high levels of antibiotic resistance, making it difficult to
eradicate; the resulting chronic infections are associated with severe declines in lung function and increased mortality rates.
B. cenocepacia strain J2315 was isolated from a CF patient and is a member of the epidemic ET12 lineage that originated in Canada or the
United Kingdom and spread to Europe. The 8.06-Mb genome of this highly transmissible pathogen comprises three circular chromosomes
and a plasmid and encodes a broad array of functions typical of this metabolically versatile genus, as well as numerous virulence
and drug resistance functions. Although B. cenocepacia strains can be isolated from soil and can be pathogenic to both plants and man, J2315 is representative of a lineage of B. cenocepacia rarely isolated from the environment and which spreads between CF patients. Comparative analysis revealed that ca. 21% of
the genome is unique in comparison to other strains of B. cenocepacia, highlighting the genomic plasticity of this species. Pseudogenes in virulence determinants suggest that the pathogenic response
of J2315 may have been recently selected to promote persistence in the CF lung. The J2315 genome contains evidence that its
unique and highly adapted genetic content has played a significant role in its success as an epidemic CF pathogen.
Full-text · Article · Nov 2008 · Journal of bacteriology
[Show abstract][Hide abstract] ABSTRACT: Clostridium botulinum is a heterogeneous Gram-positive species that comprises four genetically and physiologically distinct groups of bacteria that share the ability to produce botulinum neurotoxin, the most poisonous toxin known to man, and the causative agent of botulism, a severe disease of humans and animals. We report here the complete genome sequence of a representative of Group I (proteolytic) C. botulinum (strain Hall A, ATCC 3502). The genome consists of a chromosome (3,886,916 bp) and a plasmid (16,344 bp), which carry 3650 and 19 predicted genes, respectively. Consistent with the proteolytic phenotype of this strain, the genome harbors a large number of genes encoding secreted proteases and enzymes involved in uptake and metabolism of amino acids. The genome also reveals a hitherto unknown ability of C. botulinum to degrade chitin. There is a significant lack of recently acquired DNA, indicating a stable genomic content, in strong contrast to the fluid genome of Clostridium difficile, which can form longer-term relationships with its host. Overall, the genome indicates that C. botulinum is adapted to a saprophytic lifestyle both in soil and aquatic environments. This pathogen relies on its toxin to rapidly kill a wide range of prey species, and to gain access to nutrient sources, it releases a large number of extracellular enzymes to soften and destroy rotting or decayed tissues.
[Show abstract][Hide abstract] ABSTRACT: The human gastrointestinal (GI) tract contains a highly diverse and dynamic community of commensal microorganisms that consists of both permanent residents and seemingly transient visitors. This month's column looks at four gut-related microbial sequencing projects.
[Show abstract][Hide abstract] ABSTRACT: We determined the complete genome sequence of Clostridium difficile strain 630, a virulent and multidrug-resistant strain. Our analysis indicates that a large proportion (11%) of the genome consists of mobile genetic elements, mainly in the form of conjugative transposons. These mobile elements are putatively responsible for the acquisition by C. difficile of an extensive array of genes involved in antimicrobial resistance, virulence, host interaction and the production of surface structures. The metabolic capabilities encoded in the genome show multiple adaptations for survival and growth within the gut environment. The extreme genome variability was confirmed by whole-genome microarray analysis; it may reflect the organism's niche in the gut and should provide information on the evolution of virulence in this organism.
[Show abstract][Hide abstract] ABSTRACT: Lactobacillus salivarius subsp. salivarius strain UCC118 is a bacteriocin-producing strain with probiotic characteristics. The 2.13-Mb genome was shown by sequencing to comprise a 1.83 Mb chromosome, a 242-kb megaplasmid (pMP118), and two smaller plasmids. Megaplasmids previously have not been characterized in lactic acid bacteria or intestinal lactobacilli. Annotation of the genome sequence indicated an intermediate level of auxotrophy compared with other sequenced lactobacilli. No single-copy essential genes were located on the megaplasmid. However, contingency amino acid metabolism genes and carbohydrate utilization genes, including two genes for completion of the pentose phosphate pathway, were megaplasmid encoded. The megaplasmid also harbored genes for the Abp118 bacteriocin, a bile salt hydrolase, a presumptive conjugation locus, and other genes potentially relevant for probiotic properties. Two subspecies of L. salivarius are recognized, salivarius and salicinius, and we detected megaplasmids in both subspecies by pulsed-field gel electrophoresis of sizes ranging from 100 kb to 380 kb. The discovery of megaplasmids of widely varying size in L. salivarius suggests a possible mechanism for genome expansion or contraction to adapt to different environments.
Full-text · Article · May 2006 · Proceedings of the National Academy of Sciences
[Show abstract][Hide abstract] ABSTRACT: The obligate intracellular bacterial pathogen Chlamydophila abortus strain S26/3 (formerly the abortion subtype of Chlamydia psittaci) is an important cause of late gestation abortions in ruminants and pigs. Furthermore, although relatively rare, zoonotic infection can result in acute illness and miscarriage in pregnant women. The complete genome sequence was determined and shows a high level of conservation in both sequence and overall gene content in comparison to other Chlamydiaceae. The 1,144,377-bp genome contains 961 predicted coding sequences, 842 of which are conserved with those of Chlamydophila caviae and Chlamydophila pneumoniae. Within this conserved Cp. abortus core genome we have identified the major regions of variation and have focused our analysis on these loci, several of which were found to encode highly variable protein families, such as TMH/Inc and Pmp families, which are strong candidates for the source of diversity in host tropism and disease causation in this group of organisms. Significantly, Cp. abortus lacks any toxin genes, and also lacks genes involved in tryptophan metabolism and nucleotide salvaging (guaB is present as a pseudogene), suggesting that the genetic basis of niche adaptation of this species is distinct from those previously proposed for other chlamydial species.