[Show abstract][Hide abstract] ABSTRACT: Pseudomonas aeruginosa is capable of injecting protein toxins into other bacterial cells through one of its three type VI secretion systems (T6SSs). The activity of this T6SS is tightly regulated on the posttranslational level by phosphorylation-dependent and -independent pathways. The phosphorylation-dependent pathway consists of a Threonine kinase/phosphatase pair (PpkA/PppA) that acts on a forkhead domain-containing protein, Fha1, and a periplasmic protein, TagR, that positively regulates PpkA. In the present work, we biochemically and functionally characterize three additional proteins of the phosphorylation-dependent regulatory cascade that controls T6S activation: TagT, TagS and TagQ. We show that similar to TagR, these proteins act upstream of the PpkA/PppA checkpoint and influence phosphorylation of Fha1 and, apparatus assembly and effector export. Localization studies demonstrate that TagQ is an outer membrane lipoprotein and TagR is associated with the outer membrane. Consistent with their homology to lipoprotein outer membrane localization (Lol) components, TagT and TagS form a stable inner membrane complex with ATPase activity. However, we find that outer membrane association of T6SS lipoproteins TagQ and TssJ1, and TagR, is unaltered in a ΔtagTS background. Notably, we found that TagQ is indispensible for anchoring of TagR to the outer membrane fraction. As T6S-dependent fitness of P. aeruginosa requires TagT, S, R and Q, we conclude that these proteins likely participate in a trans-membrane signalling pathway that promotes H1-T6SS activity under optimal environmental conditions.
[Show abstract][Hide abstract] ABSTRACT: The automatic identification of syntenies across multiple species is a key step in comparative genomics that helps biologists shed light both on evolutionary and functional problems.
In this paper, we present a versatile tool to extract all syntenies from multiple bacterial species based on a clear-cut and very flexible definition of the synteny blocks that allows for gene quorum, partial gene correspondence, gaps, and a partial or total conservation of the gene order.
We apply this tool to two different kinds of studies. The first one is a search for functional gene associations. In this context, we compare our tool to a widely used heuristic--I-ADHORE--and show that at least up to ten genomes, the problem remains tractable with our exact definition and algorithm. The second application is linked to evolutionary studies: we verify in a multiple alignment setting that pairs of orthologs in synteny are more conserved than pairs outside, thus extending a previous pairwise study. We then show that this observation is in fact a function of the size of the synteny: the larger the block of synteny is, the more conserved the genes are.
[Show abstract][Hide abstract] ABSTRACT: Bacteria that live in the environment have evolved pathways specialized to defend against eukaryotic organisms or other bacteria. In this manuscript, we systematically examined the role of the five type VI secretion systems (T6SSs) of Burkholderia thailandensis (B. thai) in eukaryotic and bacterial cell interactions. Consistent with phylogenetic analyses comparing the distribution of the B. thai T6SSs with well-characterized bacterial and eukaryotic cell-targeting T6SSs, we found that T6SS-5 plays a critical role in the virulence of the organism in a murine melioidosis model, while a strain lacking the other four T6SSs remained as virulent as the wild-type. The function of T6SS-5 appeared to be specialized to the host and not related to an in vivo growth defect, as ΔT6SS-5 was fully virulent in mice lacking MyD88. Next we probed the role of the five systems in interbacterial interactions. From a group of 31 diverse bacteria, we identified several organisms that competed less effectively against wild-type B. thai than a strain lacking T6SS-1 function. Inactivation of T6SS-1 renders B. thai greatly more susceptible to cell contact-induced stasis by Pseudomonas putida, Pseudomonas fluorescens and Serratia proteamaculans-leaving it 100- to 1000-fold less fit than the wild-type in competition experiments with these organisms. Flow cell biofilm assays showed that T6S-dependent interbacterial interactions are likely relevant in the environment. B. thai cells lacking T6SS-1 were rapidly displaced in mixed biofilms with P. putida, whereas wild-type cells persisted and overran the competitor. Our data show that T6SSs within a single organism can have distinct functions in eukaryotic versus bacterial cell interactions. These systems are likely to be a decisive factor in the survival of bacterial cells of one species in intimate association with those of another, such as in polymicrobial communities present both in the environment and in many infections.
[Show abstract][Hide abstract] ABSTRACT: The rapid accumulation of microarray data is promising for discovering regulatory relationship from high-throughput gene expression data. One of the most com-mon ways is to compute co-expression clusters to find groups of co-expressed genes that are likely to be con-trolled by the same set of transcription factors. However such approach may lead to erroneous results because data are extremely noisy and genes from different regula-tory modules may concomitantly vary in a given condi-tion. To overcome these problems, we present two dif-ferent strategies that rely on a graph-based data mining approach which aim at extracting gene sets co-expressed within a subset of microarrays dataset from multiple conditions. We defined a transcriptional module as a set of vertices (genes) that always induce a dense sub-graph in a set of co-expressions graphs. Co-expression graphs are built from a Arabidopsis thaliana compendium com-prising around 500 microarray datasets and we found transcriptional modules specifically activated under 4 stress condition among a total of 48 experimental condi-tions.
[Show abstract][Hide abstract] ABSTRACT: The availability of hundreds of bacterial genomes allowed a comparative genomic study of the Type VI Secretion System (T6SS), recently discovered as being involved in pathogenesis. By combining comparative and phylogenetic approaches using more than 500 prokaryotic genomes, we characterized the global T6SS genetic structure in terms of conservation, evolution and genomic organization.
This genome wide analysis allowed the identification of a set of 13 proteins constituting the T6SS protein core and a set of conserved accessory proteins. 176 T6SS loci (encompassing 92 different bacteria) were identified and their comparison revealed that T6SS-encoded genes have a specific conserved genetic organization. Phylogenetic reconstruction based on the core genes showed that lateral transfer of the T6SS is probably its major way of dissemination among pathogenic and non-pathogenic bacteria. Furthermore, the sequence analysis of the VgrG proteins, proposed to be exported in a T6SS-dependent way, confirmed that some C-terminal regions possess domains showing similarities with adhesins or proteins with enzymatic functions.
The core of T6SS is composed of 13 proteins, conserved in both pathogenic and non-pathogenic bacteria. Subclasses of T6SS differ in regulatory and accessory protein content suggesting that T6SS has evolved to adapt to various microenvironments and specialized functions. Based on these results, new functional hypotheses concerning the assembly and function of T6SS proteins are proposed.
[Show abstract][Hide abstract] ABSTRACT: Recent experimental progress is once again producing a huge quantity of data in various areas of biology, in particular on
protein interactions. In order to extract meaningful information from this data, researchers typically use a graph representation
to which they apply network alignment tools. Because of the combinatorial difficulty of the network alignment problem, most
of the algorithms developed so far are heuristics, and the exact ones are of no use in practice on large numbers of networks.
In this paper, we propose a unified scheme on the question of network alignment and we present a new algorithm, C3Part-M, based on the work by Boyer et al. , that is much more efficient than the original one in the case of multiple networks. We compare it as concerns protein-protein
interaction networks to a recently proposed alignment tool, NetworkBLAST-M , and show that we recover similar results, while using a different but exact approach.
Combinatorial Pattern Matching, 20th Annual Symposium, CPM 2009, Lille, France, June 22-24, 2009, Proceedings; 01/2009
[Show abstract][Hide abstract] ABSTRACT: We investigate the problem of inferring contiguous ancestral regions (CARs) of the genome of the last common ancestor of all extant amniotes, based on the currently sequenced and assembled amniote genomes as ingroups and three teleost fish genomes as outgroups. We combine a methodological framework using conserved syntenies computed from whole genome alignments of amniote species together with double conserved syntenies (DCS) using gene families from amniote and fish genomes, to take into account the whole genome duplication that occurred in the teleost lineage. From these comparisons, ancestral genome segments are computed using techniques inspired by physical mapping. Due to the difficulty caused by the whole genome duplication and the large evolutionary distance to the closest assembled outgroup, very few methods have been published with a reconstruction of the amniote ancestral genome. This one is the first which is founded on a simple and formal methodological framework, whose good stability is shown and whose CARs cover large regions of the human and chicken genomes.
[Show abstract][Hide abstract] ABSTRACT: Ehrlichia ruminantium is the causative agent of heartwater, a major tick-borne disease of livestock in Africa introduced in the Caribbean and threatening to emerge and spread in the American mainland. Complete genome sequencing was done for two isolates of E. ruminantium of differing phenotype, isolates Gardel (Erga) from Guadeloupe Island and Welgevonden (Erwe) originating from South Africa and maintained in Guadeloupe. The type strain of E. ruminantium (Erwo), previously isolated and sequenced in South Africa; is identical to Erwe with respect to target genes. They make the Erwe/Erwo complex. Comparative analysis of the genomes shows the presence of 49 unique CDS and 28 truncated CDS differentiating Erga from Erwe/Erwo. Three regions of accumulated differences (RAD) acting as mutational hot spots were identified in E. ruminantium. Ten CDS, six unique CDS and four truncated CDS corresponding to major genomic changes (deletions or extensive mutations) were considered as targets for differential diagnosis on four isolates of E. ruminantium: Erga, Erwe/Erwo, Senegal and Umpala. Pairs of PCR primers were developed for each target gene. PCR analysis of the target genes generated strain-specific patterns on Erga and Erwe/Erwo as predicted by comparative genomics, but also for isolates Senegal and Umpala. The target genes identified by bacterial comparative genomics are shown to be highly efficient for strain-specific PCR diagnosis of E. ruminantium and further vaccine management tools.
Infection Genetics and Evolution 08/2008; 8(4):459-66. · 2.77 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Similarity search in texts, notably in biological sequences, has received substantial attention in the last few years. Numerous ﬁltration and indexing techniques have been created in order to speed up the solution of the problem. However, previous ﬁlters were made for speeding up pattern matching, or for ﬁnding repetitions between two strings or o ccurring twice in the same string. In this paper, we present an algorithm called Nimbus for ﬁltering strings prior to ﬁnding repetitions o ccurring twice or more in a string, or in two or more strings. Nimbus uses gapped seeds that are indexed with a new data structure, called a bi-factor array, that is also presented in this paper. Experimental results show that the ﬁlter can be very efficient: preprocessing with Nimbus a data set where one wants to ﬁnd functional elements using a multiple lo cal alignment to ol such as Glam, the overall execution time can be reduced from 7.5 hours to 2 minutes.
[Show abstract][Hide abstract] ABSTRACT: In recent years a major revolution has occurred in the analysis and understanding of pathogenesis and host–pathogens/parasite interactions. This revolution has been achieved through the emergence of the high-throughput integrative approaches used in the “omics” fields—such as genomics, transcriptomics, proteomics, interactomics, and metabolomics. The novelty of these approaches has resulted from the development of high-throughput apparatus, assisted by the increasing power and software of computers that allow for high-speed, multifactorial simultaneous analysis of numerous samples. This level of integration allows for in-depth analysis of mechanisms, pace, and patterns of the evolution and adaptation of pathogens. This evolution from linear to multifactorial approaches has opened new ways of creating and characterizing new vaccines, diagnostic candidates, and drug targets.
Annals of the New York Academy of Sciences 01/2008; 1149(1). · 4.38 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Ehrlichia ruminantium is the causative agent of heartwater, an important tick-borne disease of livestock in Africa and the Caribbean that threatens the American mainland. The genome sequences of three strains of E. ruminantium have recently been published, revealing the presence of specific features related to genomic plasticity. E. ruminantium strains have traces of active genomic modifications, such as high substitution rates, truncated genes and the presence of pseudogenes and many tandem repeats. The most specific feature is the presence in all Ehrlichia of independent long-period tandem repeats, which are associated with expansion or contraction of intergenic regions.
Trends in Parasitology 10/2007; 23(9):414-9. · 5.51 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Chromosomes or other long DNA sequences contain many highly similar repeated sub-sequences. While there are efficient methods for detecting strict repeats or detecting already characterized repeats, there is no software available for detecting approximate repeats in large DNA sequences allowing for weighted substitutions and indels in a coherent statistical framework. Here, we present an implementation of a two-steps method (seed detection followed by their extension) that detects those approximate repeats. Our method is computationally efficient enough to handle large sequences and is flexible enough to account for influencing factors, such as sequence-composition biases both at the seed detection and alignment levels. AVAILABILITY: http://wwwabi.snv.jussieu.fr/public/RepSeek/
[Show abstract][Hide abstract] ABSTRACT: The tick-borne Rickettsiale Ehrlichia ruminantium (E. ruminantium) is the causative agent of heartwater in Africa and the Caribbean. Heartwater, responsible for major losses on livestock in Africa represents also a threat for the American mainland. Three complete genomes corresponding to two different groups of differing phenotypes, Gardel and Welgevonden, have been recently described. One genome (Erga) represents the Gardel group from Guadeloupe Island and two genomes (Erwo and Erwe) belong to the Welgevonden group. Erwo, isolated in South Africa, is the parental strain of Erwe, which was maintained for 18 years in Guadeloupe under different culture conditions than Erwo. The three strains display genomes of differing sizes with 1,499,920 bp, 1,512,977 bp, and 1,516,355 bp for Erga, Erwe, and Erwo, respectively. Gene sequences and order are highly conserved between the three strains, although several gene truncations could be pinpointed, most of them occurring within three regions of accumulated differences (RAD). E. ruminantium displays a strong leading/lagging compositional bias inducing a strand-specific codon usage. Finally, a striking feature of E. ruminantium is the presence of long intergenic regions containing tandem repeats. These repeats are at the origin of an active process, specific to E. ruminantium, of genome expansion/contraction based on the addition or removal of tandem units.
Annals of the New York Academy of Sciences 11/2006; 1081:417-33. · 4.38 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Modern comparative genomics does not restrict to sequence but involves the comparison of metabolic pathways or protein-protein interactions as well. Central in this approach is the concept of neighbourhood between entities (genes, proteins, chemical compounds). Therefore there is a growing need for new methods aiming at merging the connectivity information from different biological sources in order to infer functional coupling.
We present a generic approach to merge the information from two or more graphs representing biological data. The method is based on two concepts. The first one, the correspondence multigraph, precisely defines how correspondence is performed between the primary data-graphs. The second one, the common connected components, defines which property of the multigraph is searched for. Although this problem has already been informally stated in the past few years, we give here a formal and general statement together with an exact algorithm to solve it.
The algorithm presented in this paper has been implemented in C. Source code is freely available for download at: http://www.inrialpes.fr/helix/people/viari/cccpart.
[Show abstract][Hide abstract] ABSTRACT: Similarity search in texts, notably biological sequences, has received substantial attention in the last few years. Numerous filtration and indexing techniques have been created in order to speed up the res- olution of the problem. However, previous filters were made for speeding up pattern matching, or for finding repetitions between two sequences or occurring twice in the same sequence. In this paper, we present an algo- rithm called NIMBUS for filtering sequences prior to finding repetitions occurring more than twice in a sequence or in more than two sequences. NIMBUS uses gapped seeds that are indexed with a new data structure, called a bi-factor array, that is also presented in this paper. Experimen- tal results show that the filter can be very efficient: preprocessing with NIMBUS a data set where one wants to find functional elements using a multiple local alignment tool such as GLAM ((7)), the overall execution time can be reduced from 10 hours to 6 minutes while obtaining exactly the same results.
String Processing and Information Retrieval, 12th International Conference, SPIRE 2005, Buenos Aires, Argentina, November 2-4, 2005, Proceedings; 01/2005
[Show abstract][Hide abstract] ABSTRACT: We propose a new formulation for the problem of ab initio metabolic pathway reconstruction. Given a set of biochemical reactions together with their substrates and products, we consider the reactions as transfers of atoms between the chemical compounds and we look for successions of reactions transferring a maximal (or preset) number of atoms between a given source and sink compound. We state this problem as the one of finding a composition of partial injections that maximizes the image size. First, we study the theoretical complexity of this problem, state some related problems and then give a practical algorithm to solve them. Finally, we present two applications of this approach to the reconstruction of the tryptophan biosynthesis pathway and to the glycolysis.
[Show abstract][Hide abstract] ABSTRACT: La reconstruction des voies métaboliques d un organisme est une tâche importante en biologie et plusieurs approches ont déjà été proposées pour assister ce travail mais il y a un besoin pour des approches plus exploratoires.
La première partie de cette thèse s'intéresse à la reconstruction ab initio de voies métaboliques. Cela consiste à retrouver au sein du réseau de l'ensemble des réactions chimiques décrites pour un organisme vivant, un sous réseau connectant au moins deux composés. Nous proposons une nouvelle formulation de ce problème qui considère les réactions comme des transferts d'atomes entre composés chimiques. Une voie métabolique est ainsi associée à un transfert d'atomes entre deux composés. Le problème de la reconstruction est alors de rechercher la succession de réactions maximisant le nombre d atomes transférés entre ces deux composés. Ce problème est exprimé comme la recherche d'une composition d'injections partielles dont la taille de l'image est maximale. La complexité de ce problème a été étudiée et un algorithme le résolvant est présenté.
La seconde partie présente la formalisation d'un problème de comparaison de graphes. Le cas particulier traité dans cette thèse concerne la comparaison d'un réseau de réactions avec l'organisation spatiale des gènes sur le génome. Cette comparaison permet l'identification de voies métaboliques codées en opérons dans les génomes bactériens.