[Show abstract][Hide abstract] ABSTRACT: No Crohn's disease (CD) molecular maker has advanced to clinical use, and independent lines of evidence support a central role of the gut microbial community in CD. Here we explore the feasibility of extracting bacterial protein signals relevant to CD, by interrogating myriads of intestinal bacterial proteomes from a small number of patients and healthy controls.
We first developed and validated a workflow-including extraction of microbial communities, two-dimensional difference gel electrophoresis (2D-DIGE), and LC-MS/MS-to discover protein signals from CD-associated gut microbial communities. Then we used selected reaction monitoring (SRM) to confirm a set of candidates. In parallel, we used 16S rRNA gene sequencing for an integrated analysis of gut ecosystem structure and functions.
Our 2D-DIGE-based discovery approach revealed an imbalance of intestinal bacterial functions in CD. Many proteins, largely derived from Bacteroides species, were over-represented, while under-represented proteins were mostly from Firmicutes and some Prevotella members. Most overabundant proteins could be confirmed using SRM. They correspond to functions allowing opportunistic pathogens to colonise the mucus layers, breach the host barriers and invade the mucosae, which could still be aggravated by decreased host-derived pancreatic zymogen granule membrane protein GP2 in CD patients. Moreover, although the abundance of most protein groups reflected that of related bacterial populations, we found a specific independent regulation of bacteria-derived cell envelope proteins.
This study provides the first evidence that quantifiable bacterial protein signals are associated with CD, which can have a profound impact on future molecular diagnosis.
[Show abstract][Hide abstract] ABSTRACT: Developing tools for visualizing DNA sequences is an important issue in the Barcoding context. Visualizing Barcode data can be put in a purely statistical context, unsupervised learning. Clustering methods combined with projection methods have two closely linked objectives, visualizing and finding structure in the data. Multidimensional scaling (MDS) and Self-organizing maps (SOM) are unsupervised statistical tools for data visualization. Both algorithms map data onto a lower dimensional manifold: MDS looks for a projection that best preserves pairwise distances while SOM preserves the topology of the data. Both algorithms were initially developed for Euclidean data and the conditions necessary to their good implementation were not satisfied for Barcode data. We developed a workflow consisting in four steps: collapse data into distinct sequences; compute a dissimilarity matrix; run a modified version of SOM for dissimilarity matrices to structure the data and reduce dimensionality; project the results using MDS. This methodology was applied to Astraptes fulgerator and Hylomyscus, an African rodent with debated taxonomy. We obtained very good results for both data sets. The results were robust against unbalanced species. All the species in Astraptes were well displayed in very distinct groups in the various visualizations, except for LOHAMP and FABOV that were mixed up. For Hylomyscus, our findings were consistent with known species, confirmed the existence of four unnamed taxa and suggested the existence of potentially new species.
[Show abstract][Hide abstract] ABSTRACT: Prions cause fatal neurodegenerative conditions and result from the conversion of host-encoded cellular prion protein (PrP(C)) into abnormally folded scrapie PrP (PrP(Sc)). Prions can propagate both in neurons and astrocytes, yet neurotoxicity mechanisms remain unclear. Recently, PrP(C) was proposed to mediate neurotoxic signaling of β-sheet-rich PrP and non-PrP conformers independently of conversion. To investigate the role of astrocytes and neuronal PrP(C) in prion-induced neurodegeneration, we set up neuron and astrocyte primary cocultures derived from PrP transgenic mice. In this system, prion-infected astrocytes delivered ovine PrP(Sc) to neurons lacking PrP(C) (prion-resistant), or expressing a PrP(C) convertible (sheep) or not (mouse, human). We show that interaction between neuronal PrP(C) and exogenous PrP(Sc) was not sufficient to induce neuronal death but that efficient PrP(C) conversion was required for prion-associated neurotoxicity. Prion-infected astrocytes markedly accelerated neurodegeneration in homologous cocultures compared to infected single neuronal cultures, despite no detectable neurotoxin release. Finally, PrP(Sc) accumulation in neurons led to neuritic damages and cell death, both potentiated by glutamate and reactive oxygen species. Thus, conversion of neuronal PrP(C) rather than PrP(C)-mediated neurotoxic signaling appears as the main culprit in prion-induced neurodegeneration. We suggest that active prion replication in neurons sensitizes them to environmental stress regulated by neighboring cells, including astrocytes.
The FASEB Journal 06/2012; 26(9):3854-61. · 5.70 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: DNA barcoding is the assignment of individuals to species using standardized mitochondrial sequences. Nuclear data are sometimes added to the mitochondrial data to increase power. A barcoding method for analysing mitochondrial and nuclear data is developed. It is a Bayesian method based on the coalescent model. Then this method is assessed using simulated and real data. It is found that adding nuclear data can reduce the number of ambiguous assignments. Finally, the robustness of coalescent-based barcoding to departures from model assumptions is studied using simulations. This method is found to be robust to past population size variations, to within-species population structures, and to designs that poorly sample populations within species. Supplementary Material is available online at www.liebertonline.com/cmb.
Journal of computational biology: a journal of computational molecular cell biology 03/2012; 19(3):271-8. · 1.69 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: In order to improve the identification of avian pathogenic Escherichia coli (APEC) strains, an extensive characterization of 1,491 E. coli isolates was conducted, based on serotyping, virulence genotyping, and experimental pathogenicity for chickens. The isolates originated from lesions of avian colibacillosis (n = 1,307) or from the intestines of healthy animals (n = 184) from France, Spain, and Belgium. A subset (460 isolates) of this collection was defined according to their virulence for chicks. Six serogroups (O1, O2, O5, O8, O18, and O78) accounted for 56.5% of the APEC isolates and 22.5% of the nonpathogenic isolates. Thirteen virulence genes were more frequently present in APEC isolates than in nonpathogenic isolates but, individually, none of them could allow the identification of an isolate as an APEC strain. In order to take into account the diversity of APEC strains, a statistical analysis based on a tree-modeling method was therefore conducted on the sample of 460 pathogenic and nonpathogenic isolates. This resulted in the identification of four different associations of virulence genes that enables the identification of 70.2% of the pathogenic strains. Pathogenic strains were identified with an error margin of 4.3%. The reliability of the link between these four virulence patterns and pathogenicity for chickens was validated on a sample of 395 E. coli isolates from the collection. The genotyping method described here allowed the identification of more APEC isolates with greater reliability than the classical serotyping methods currently used in veterinary laboratories.
Journal of clinical microbiology 02/2012; 50(5):1673-8. · 4.16 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The Praomyini tribe is one of the most diverse and abundant groups of Old World rodents. Several species are known to be involved in crop damage and in the epidemiology of several human and cattle diseases. Due to the existence of sibling species their identification is often problematic. Thus an easy, fast and accurate species identification tool is needed for non-systematicians to correctly identify Praomyini species. In this study we compare the usefulness of three genes (16S, Cytb, CO1) for identifying species of this tribe. A total of 426 specimens representing 40 species (sampled across their geographical range) were sequenced for the three genes. Nearly all of the species included in our study are monophyletic in the neighbour joining trees. The degree of intra-specific variability tends to be lower than the divergence between species, but no barcoding gap is detected. The success rate of the statistical methods of species identification is excellent (up to 99% or 100% for statistical supervised classification methods as the k-Nearest Neighbour or Random Forest). The 16S gene is 2.5 less variable than the Cytb and CO1 genes. As a result its discriminatory power is smaller. To sum up, our results suggest that using DNA markers for identifying species in the Praomyini tribe is a largely valid approach, and that the CO1 and Cytb genes are better DNA markers than the 16S gene. Our results confirm the usefulness of statistical methods such as the Random Forest and the 1-NN methods to assign a sequence to a species, even when the number of species is relatively large. Based on our NJ trees and the distribution of all intraspecific and interspecific pairwise nucleotide distances, we highlight the presence of several potentially new species within the Praomyini tribe that should be subject to corroboration assessments.
PLoS ONE 01/2012; 7(5):e36586. · 3.53 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species.
No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci--nuclear genes--improved the predictive performance of most methods.
The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.
[Show abstract][Hide abstract] ABSTRACT: Mosquitoes, acting as vectors, are involved in the transmission of viruses. Thus, their abundances, which strongly depend on the weather and environment, are closely linked to major disease outbreaks. The aim of this paper is to provide a tool to predict vector abundance. In order to describe the dynamics of mosquito populations, we developed a matrix model integrating climate fluctuations. The population is structured in five stages: two egg stages (immature and mature), one larval stage and two female flying stages (nulliparous and parous). The water availability in breeding sites was considered as the main environmental factor affecting the mosquito life-cycle. Thus, the model represents the evolution of the mosquito abundance in each stage over time, in connection with water availability. The model was used to simulate the abundance trends over 3 years of two mosquito species, Aedes africanus (Theobald) and Aedes furcifer (Edwards), vectors of the yellow fever virus in Ivory Coast. As both these species breed in tree holes, the water dynamics in the tree hole was reproduced from daily rainfall data. The results we obtained showed a good match between the simulated populations and the field data over the time period considered.
Infection Genetics and Evolution 08/2008; 8(4):422-32. · 2.77 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: The study of the interactions of cellular components is an essential base step to understand the structure and dynamics of biological networks. Various methods were recently developed for this purpose. While most of them combine different types of data and a priori knowledge, methods based on graphical Gaussian models are capable of learning the network directly from raw data. They consider the full-order partial correlations which are partial correlations between two variables given the remaining ones, for modeling direct links between variables. Statistical methods were developed for estimating these links when the number of observations is larger than the number of variables. However, the rapid advance of new technologies that allow the simultaneous measure of genome expression, led to large-scale datasets where the number of variables is far larger than the number of observations. To get around this dimensionality problem, different strategies and new statistical methods were proposed. In this study we focused on statistical methods recently published. All are based on the fact that the number of direct relationships between two variables is very small in regards to the number of possible relationships, p(p-1)/2. In the biological context, this assumption is not always satisfied over the whole graph. It is essential to precisely know the behavior of the methods in regards to the characteristics of the studied object before applying them. For this purpose, we evaluated the validity domain of each method from wide-ranging simulated datasets. We then illustrated our results using recently published biological data.
Statistical Applications in Genetics and Molecular Biology 02/2008; 7(1):Article 14. · 1.52 Impact Factor
[Show abstract][Hide abstract] ABSTRACT: Proteomics relies on the separation of complex protein mixtures using bidimensional electrophoresis. This approach is largely used to detect the expression variations of proteins prepared from two or more samples. Recently, attention was drawn on the reliability of the results published in literature. Among the critical points identified were experimental design, differential analysis and the problem of missing data, all problems where statistics can be of help. Using examples and terms understandable by biologists, we describe how a collaboration between biologists and statisticians can improve reliability of results and confidence in conclusions.
Journal of Chromatography B 05/2007; 849(1-2):261-72. · 2.49 Impact Factor