ArticlePDF Available

Whole-Genome Random Sequencing and Assembly of Haemophilus Influenzae Rd

Article

Whole-Genome Random Sequencing and Assembly of Haemophilus Influenzae Rd

Abstract

An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism.
... Chromosomal alterations in proteins associated with increased resistance to beta-lactams (PBP3) or quinolones (quinolone resistance-determining regions (QRDR) of GyrA and ParC (aa 80-92) (Georgiou et al., 1996;Li et al., 2004) were identified by comparison with the reference sequence H. influenzae Rd KW20 (Fleischmann et al., 1995) through multiple sequence alignments of translated genes. Genotypic categorization of isolates with PBP3-mediated resistance (rPBP3) according to level (low-or high-rPBP3), stage (1-3) and group was done as described previously (Skaare et al., 2014b). ...
... isolates (Norway 2017-2021) into rPBP3 genotypes based on amino acid substitutions in positions 385, 389, 517, and 526 compared to reference strain Rd KW20 (Fleischmann et al., 1995). + Stage 0 1 1 2 2 3 3 Group -I II III-like(−) III(−) III-like(+) III(+) The STs of the major invasive Norwegian Hi clusters, except clusters 4 (ST210) and 13 (ST835), also predominate among invasive Hi isolates in Italy, Portugal, and France (Giufrè et al., 2018;Deghmane et al., 2019;Heliodoro et al., 2020). ...
Article
Full-text available
Invasive Haemophilus influenzae (Hi) disease has decreased in countries that included Hi type b (Hib) vaccination in their childhood immunization programs in the 1990s. Non-typeable (NT) and non-b strains are now the leading causes of invasive Hi disease in Europe, with most cases reported in young children and the elderly. Concerningly, no vaccines toward such strains are available and beta-lactam resistance is increasing. We describe the epidemiology of invasive Hi disease reported to the Norwegian Surveillance System for Communicable Diseases (MSIS) (2017–2021, n = 407). Whole-genome sequencing (WGS) was performed on 245 isolates. We investigated the molecular epidemiology (core genome phylogeny) and the presence of antibiotic resistance markers (including chromosomal mutations associated with beta-lactam or quinolone resistance). For isolates characterized with both WGS and phenotypic antibiotic susceptibility testing (AST) ( n = 113) we assessed correlation between resistance markers and susceptibility categorization by calculation of sensitivity, specificity, and predictive values. Incidence rates of invasive Hi disease in Norway ranged from 0.7 to 2.3 per 100,000 inhabitants/year (mean 1.5 per 100,000) and declined during the COVID-19 pandemic. The bacterial population consisted of two major phylogenetic groups with subclustering by serotype and multi-locus sequence type (ST). NTHi accounted for 71.8% (176). The distribution of STs was in line with previous European reports. We identified 13 clusters, including four encapsulated and three previously described international NTHi clones with bla TEM–1 (ST103) or altered PBP3 (rPBP3) (ST14/IIA and ST367/IIA). Resistance markers were detected in 25.3% (62/245) of the isolates, with bla TEM–1 (31, 50.0%) and rPBP3 (28, 45.2%) being the most frequent. All isolates categorized as resistant to aminopenicillins, tetracycline or chloramphenicol possessed relevant resistance markers, and the absence of relevant substitutions in PBP3 and GyrA/ParC predicted susceptibility to cefotaxime, ceftriaxone, meropenem and quinolones. Among the 132 WGS-only isolates, one isolate had PBP3 substitutions associated with resistance to third-generation cephalosporins, and one isolate had GyrA/ParC alterations associated with quinolone resistance. The detection of international virulent and resistant NTHi clones underlines the need for a global molecular surveillance system. WGS is a useful supplement to AST and should be performed on all invasive isolates.
... En effet, François Jacob, Jacques Monod et François Gros avaient déjà élucidé le mécanisme de biosynthèse des protéines [1] et le code génétique avait été décodé par Nirenberg, Khorana et Ochoa [2]. Il a néanmoins fallu attendre 1995 et de nombreuses innovations technologiques pour disposer du premier génome entièrement séquencé, celui du procaryote Haemophilus influenzae [3]. La levure Saccharomyces cere-Introduction l'apprentissage profond était pour la première fois appliqué à un problème de génomique [10]. ...
Thesis
Des avancées technologiques récentes dans le domaine des biotechnologies telles que CRISPR et la synthèse de novo d'oligonucléotides d'ADN permettent désormais de modifier précisément et dans de grandes proportions les génomes. Des projets visant à concevoir des génomes partiellement ou complètement synthétiques, en particulier des génomes de levure, se sont développés en tirant profit de ces technologies. Cependant, pour atteindre ces objectifs, il est nécessaire de contrôler l'activité des séquences artificielles, ce qui demeure aujourd'hui un défi. Heureusement, l'émergence récente de méthodologies d'apprentissage profond capables de reconnaître la fonction génomique associée à une séquence d'ADN peut fournir un outil puissant pour anticiper l'activité des génomes synthétiques et en faciliter la conception. Dans cette perspective, nous proposons d'utiliser les méthodologies d'apprentissage profond afin de concevoir des séquences synthétiques de levure permettant de contrôler la structure locale du génome. Je présenterai en particulier la méthodologie que nous avons développée afin de concevoir des séquences synthétiques positionnant précisément les nucléosomes - une molécule déterminant la structure de l'ADN à la plus basse échelle - chez la levure. Je montrerai aussi que cette méthodologie ouvre la perspective de concevoir des séquences contrôlant le niveau de structure immédiatement supérieur : les boucles. La conception de séquences contrôlant la structure locale permet d'identifier précisément les déterminants de cette structure.
... C'est à partir de la fin du XXe siècle que le séquençage total du génome bactérien est possible par le biais d'analyses informatiques. Le génome bactérien de l'HIb est séquencé (Fleischmann et al., 1995) dans l'intention de sélectionner de nouveaux Ag non découverts grâce aux technologies classiques. ...
Thesis
Les vaccins doivent d’être plus sûrs et plus immunogènes que leurs ascendants empiriquement développés afin d’éradiquer les maladies infectieuses à l’origine de millions de morts dans le monde chaque année. Pour cela, il nous faut comprendre les interrelations entre les réponses immunitaires innées et adaptatives sous-jacentes à la protection acquise par la vaccination. Au cours d’études cliniques randomisées, de vaccins contre la grippe et contre le VIH, des données issues de dosages immunologiques classiques sont combinées aux profilages moléculaires par cartographie omique grâce à la biologie des systèmes. Ainsi, des signatures moléculaires de l’immunité innée essentielles à la conduite des immunités humorale et cellulaire, sont mises en lumière tout en considérant l’importance des voies d’administration. Alors, les réponses anticorps provoquées par la vaccination, selon les voies intramusculaire et intradermique, reposent sur des molécules issues de la signalisation interféron, du traitement et de la présentation antigénique, dépendent de l’état de développement des cellules B et passent par l’induction de CXCL10 ou d’IL-6. Tandis que les réponses cellulaires T CD8+ après immunisation transcutanée sont engagées par les voies métaboliques. De telles signatures moléculaires, traduites en biomarqueurs d’efficacité vaccinale, nous guident vers un développement rationnel des vaccins et faciliteront leur dépistage lors d’essais cliniques pour répondre plus rapidement aux épidémies. Une meilleure compréhension des origines de l’hétérogénéité interindividuelle des réponses vaccinales permettront d’adapter les formulations afin d’améliorer la protection au sein des populations.
... Since the first fully sequenced bacterial genome of Haemophilus influenzae type B became available in 1995 [7], sequencing technologies have rapidly evolved and are now used in patient care and infection control management [8]. Whole genome sequencing (WGS) and metagenomics use different approaches to determine the genomic content in a sample. ...
Article
Full-text available
Whole genome sequencing (WGS) provides the highest resolution for genome-based species identification and can provide insight into the antimicrobial resistance and virulence potential of a single microbiological isolate during the diagnostic process. In contrast, metagenomic sequencing allows the analysis of DNA segments from multiple microorganisms within a community, either using an amplicon- or shotgun-based approach. However, WGS and shotgun metagenomic data are rarely combined, although such an approach may generate additive or synergistic information, critical for, e.g., patient management, infection control, and pathogen surveillance. To produce a combined workflow with actionable outputs, we need to understand the pre-to-post analytical process of both technologies. This will require specific databases storing interlinked sequencing and metadata, and also involves customized bioinformatic analytical pipelines. This review article will provide an overview of the critical steps and potential clinical application of combining WGS and metagenomics together for microbiological diagnosis.
... Following the sequencing of the first full genome in 1995 [14], early methods attempted to perform large-scale MSA on whole genomes. However, the initial release of the human genome in 2000 [15] showed that the same constraints which narrow the solution space of MSA prohibit it from capturing all of the evolutionary events that need to be tracked, which turn out to be common [7,[16][17][18][19][20][21][22][23]. ...
Article
Full-text available
With the arrival of telomere-to-telomere (T2T) assemblies of the human genome comes the computational challenge of efficiently and accurately constructing multiple genome alignments at an unprecedented scale. By identifying nucleotides across genomes which share a common ancestor, multiple genome alignments commonly serve as the bedrock for comparative genomics studies. In this review, we provide an overview of the algorithmic template that most multiple genome alignment methods follow. We also discuss prospective areas of improvement of multiple genome alignment for keeping up with continuously arriving high-quality T2T assembled genomes and for unlocking clinically-relevant insights.
Article
Full-text available
The systematics of bacteria is a fundamentally important discipline of microbiology. The huge metabolic diversity and horizontal gene transfer tendencies of bacteria make it difficult to line them in a systematic manner. The current bacterial species classification is based on a combination of genotypic and phenotypic properties. More than 6200 species of prokaryotes have valid names according to the International Committee on Systematic Bacteriology (ICSB). In this review, approaches and techniques applied for identification based on biochemical and physiological investigations of bacteria, chemotaxonomic, mass spectroscopic techniques and the genotypic methods are evaluated. The combined approaches, classic culturing methods and innovative molecular approaches provide more reliable results for the identification of prokaryotes. Keywords : Sys tem at ics, bacteriology, DNA se quenc ing, 16S rRNA, bac te rial char ac ter iza tion.
Article
Clinical microbiology has possessed a marvelous past, an important present and a bright future. Western medicine modernization started with the discovery of bacterial pathogens, and from then, clinical bacteriology became a cornerstone of diagnostics. Today, clinical microbiology uses standard techniques including Gram stain morphology, in vitro culture, antigen and antibody assays, and molecular biology both to establish a diagnosis and monitor the progression of microbial infections. Clinical microbiology has played a critical role in pathogen detection and characterization for emerging infectious diseases as evidenced by the ongoing COVID-19 pandemic. Revolutionary changes are on the way in clinical microbiology with the application of “-omic” techniques, including transcriptomics and metabolomics, and optimization of clinical practice configurations to improve outcomes of patients with infectious diseases.
Chapter
In the current post-genomic era, advancement in high-throughput genome sequencing and associated computational tools for genomic analyses have enabled researchers to gain new insights into molecular details of microbe-microbe/microbe-host/microbe-environment interactions. The last decade has also witnessed the onset of other ‘OMICS’, i.e. (meta)genomic, (meta)transcriptomic, (meta)proteomic, metabolomic, fluxomic, and mobilomic to gain system-level understanding of microbial physiology for novel biotechnological interventions. Simultaneously, over the years, various databases, prediction softwares, algorithms, and pipelines have been developed to mine and interpret these OMICS data for getting useful information on a spectrum of microbial processes. But systematic use and assessment of such software/database tools/algorithms without a standardized framework, adequate guidance, proper selection criteria, and ideal computational background remain to be a major challenge. OMICS and associative computational tools have widely been applied for understanding plant-microbiome interaction or crosstalk, especially plant diseases; pest-, fertilizer-, pesticide-, water- and nutrient- management; agricultural production; climate change, etc. But unsystematic use and assessment of such tools/techniques with lack of proper workflow (a guidance map) have been encountered, which has led to a developer-researcher (end-user) gap. Hence, genomic benchmarking measures would greatly help research communities in using computational tools for selecting appropriate software/tools/methods based on specific data and major research aim. In this context, this chapter describes the use of various curated databases, open-source tools, software and pipelines associated with genomic data mining (other OMICS as well) for uncovering plant-microbiome interactions. The information would help to rationalize the host’s metabolic engineering to better optimize the analysis framework for gaining system-level understanding of plant-microbiome communications.KeywordsPlant-microbiome interactionGenomicsOMICSComputational toolsDatabasesPipelinesMethodological workflowSystem biology
Article
A list of currently identified gene products of Escherichia coli is given, together with a bibliography that provides pointers to the literature on each gene product. A scheme to categorize cellular functions is used to classify the gene products of E. coli so far identified. A count shows that the numbers of genes concerned with small-molecule metabolism are on the same order as the numbers concerned with macromolecule biosynthesis and degradation. One large category is the category of tRNAs and their synthetases. Another is the category of transport elements. The categories of cell structure and cellular processes other than metabolism are smaller. Other subjects discussed are the occurrence in the E. coli genome of redundant pairs and groups of genes of identical or closely similar function, as well as variation in the degree of density of genetic information in different parts of the genome.
Article
Chloroplasts contain their own autonomously replicating DNA genome. The majority of proteins present in the chloroplasts are encoded by nuclear DNA, but the rest are encoded by chloroplast DNA and synthesized by the chloroplast transcription–translation machinery1–4. Although the nucleotide sequences of many chloroplast genes from various plant species have been determined, the entire gene organization of the chloroplast genome has not yet been elucidated for any species of plants. To improve our understanding of the chloroplast gene system, we have determined the complete sequence of the chloroplast DNA from a liverwort, Marchantia polymorpha, and deduced the gene organization. As reported here the liverwort chloroplast DNA contains 121,024 base pairs (bp), consisting of a set of large inverted repeats (IRA and IRB, each of 10,058 bp) separated by a small single-copy region (SSC, 19,813 bp) and a large single-copy region (LSC, 81,095 bp). We detected 128 possible genes throughout the liverwort chloroplast genome, including coding sequences for four kinds of ribosomal RNAs, 32 species of transfer RNAs and 55 identified open reading frames (ORFs) for proteins, which are separated by short A+T-rich spacers (Fig. 1). Twenty genes (8 encoding tRNAs, 12 encoding proteins) contain introns in their coding sequences. These introns can be classified as belonging to either group I or group II, as described for mitochondria5. Interestingly, seven of the identified ORFs show high homology to unidentified reading frames (URFs) found in human mitochondria6,7.
Article
Genomic DNA that encodes the β1 subunit of the human γ-aminobutyric acidA (GABAA) receptor was cloned and mepped. Exons and flanking introns (>14 kb) were sequenced to determine the structural organization of the gene. The gene was localized on human chromosome 4, in bands p12–13. The β1 subunit is encoded by a relatively large gene (>65 kb) on nine exons. In contrast to other conserved regions of the subunit polypeptide, the proposed channel-forming domain (M2) is derived from more than one exon. The organization of exons was compared with that of the genes that code for subunits of nicotinic acetylcholine receptors. There is no evidence for conservation of gene structure between these two members of the proposed gene superfamily. However, intron-exon junctions were found to be conserved precisely between subtypes of GABAA receptor subunits.
Article
In the framework of the European project aimed at the sequencing of the Bacillus subtilis genome the DNA region located between gerB (314°) and sacXV (333°) was assigned to the Institut Pasteur. In this paper we describe the cloning and sequencing of a segment of 97 kb of contiguous DNA. Ninety-two open reading frames were predicted to encode putative proteins among which only forty-two were found to display significant similarities to known proteins present in databanks, e.g. amino acid permeases, proteins involved in cell wall or antibiotic biosynthesis, various regulatory proteins, proteins of several dehydrogenase families and enzymes II of the phosphotransferase system involved in sugar transport. Additional experiments led to the identification of the products of new B. subtilis genes, e.g. galactokinase and an operon involved in thiamine biosynthesis.
Article
A computer program that progressively evaluates the hydrophilicity and hydrophobicity of a protein along its amino acid sequence has been devised. For this purpose, a hydropathy scale has been composed wherein the hydrophilic and hydrophobic properties of each of the 20 amino acid side-chains is taken into consideration. The scale is based on an amalgam of experimental observations derived from the literature. The program uses a moving-segment approach that continuously determines the average hydropathy within a segment of predetermined length as it advances through the sequence. The consecutive scores are plotted from the amino to the carboxy terminus. At the same time, a midpoint line is printed that corresponds to the grand average of the hydropathy of the amino acid compositions found in most of the sequenced proteins. In the case of soluble, globular proteins there is a remarkable correspondence between the interior portions of their sequence and the regions appearing on the hydrophobic side of the midpoint line, as well as the exterior portions and the regions on the hydrophilic side. The correlation was demonstrated by comparisons between the plotted values and known structures determined by crystallography. In the case of membrane-bound proteins, the portions of their sequences that are located within the lipid bilayer are also clearly delineated by large uninterrupted areas on the hydrophobic side of the midpoint line. As such, the membrane-spanning segments of these proteins can be identified by this procedure. Although the method is not unique and embodies principles that have long been appreciated, its simplicity and its graphic nature make it a very useful tool for the evaluation of protein structures.
Article
Analysis of the mitochondrial DNA of a liverwort Marchantia polymorpha by electron microscopy and restriction endonuclease mapping indicated that the liverwort mitochondrial genome was a single circular molecule of about 184,400 base-pairs. We have determined the complete sequence of the liverwort mitochondrial DNA and detected 94 possible genes in the sequence of 186,608 base-pairs. These included genes for three species of ribosomal RNA, 29 genes for 27 species of transfer RNA and 30 open reading frames (ORFs) for functionally known proteins (16 ribosomal proteins, 3 subunits of H+-ATPase, 3 subunits of cytochrome c oxidase, apocytochrome b protein and 7 subunits of NADH ubiquinone oxidoreductase). Three ORFs showed similarity to ORFs of unknown function in the mitochondrial genomes of other organisms. Furthermore, 29 ORFs were predicted as possible genes by using the index of G + C content in first, second and third letters of codons (42.0 ± 10.9%, 37.0 ± 13.2% and 26.4 ± 9.4%, respectively) obtained from the codon usages of identified liverwort genes. To date, 32 introns belonging to either group I or group II intron have been found in the coding regions of 17 genes including ribosomal RNA genes (rrn18 and rrn26), a transfer RNA gene (trnS) and a pseudogene (ψnad7). RNA editing was apparently lacking in liverwort mitochondria since the nucleotide sequences of the liverwort mitochondrial DNA were well-conserved at the DNA level.
Article
We have implemented the Smith and Waterman dynamic programming algorithm on the massively parallel MP1104 computer from MasPar and compared its ability to detect remote protein sequence homologies with that of other commonly used database search algorithms. Dynamic programming algorithms are normally too computer intensive to permit full databases search, however on the MP1104 a search of the Swiss-Prot database takes about 15 s. This nearly interactive speed of database searching permits one to optimize the parameters for each query. Most of the common database search methods (FASTA, FASTDB and BLAST) gain their speed by using approximations such as word matching or eliminating gaps from the alignments which prevents them from detecting remote homologies. By using queries from protein super families containing a large number of family members of diverse similarities, we have measured the ability of each of these algorithms to detect the remotest members of each super family. Using these super families, we have found that the algorithms, in order of decreasing sensitivity are BLAZE, FASTDB, FASTA and BLAST. Hence the massively parallel computers allow one to have maximal sensitivity and search speed simultaneously.
Article
The problem of predicting gene locations in newly sequenced DNA is well known but still far from being successfully resolved. A novel approach to the problem based on the frame dependent (non-homogeneous) Markov chain models of protein-coding regions was previously suggested. This approach is, apparently, one of the most powerful “search by content” methods. The initial idea of the method combines the specific Markov models of coding and non-coding region together with Bayes' decision making function and allows easy generalization for employing of higher order Markov chain models. Another generalization which is described in this article allows the analysis of both DNA strands simultaneously. Currently known gene searching methods perform the analysis of the two DNA strands in turn, one after another. In doing this all the known methods fail in the sense that they generate false (artifactual) prediction signals for the given strand when the real coding region is located on the complementary DNA strand. This common drawback is avoided by employing the Bayesian algorithm which uses an additional non-homogeneous Markov chain model of the “shadow” of the coding region—the sequence which is complementary to the protein-coding sequence.
Article
Methods for alignment of protein sequences typically measure similarity by using a substitution matrix with scores for all possible exchanges of one amino acid with another. The most widely used matrices are based on the Dayhoff model of evolutionary rates. Using a different approach, we have derived substitution matrices from about 2000 blocks of aligned sequence segments characterizing more than 500 groups of related proteins. This led to marked improvements in alignments and in searches using queries from each of the groups.