PhenoLink - a web-tool for linking phenotype to ~omics data for bacteria: application to gene-trait matching for Lactobacillus plantarum strains

Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, PO Box 9101, Nijmegen, The Netherlands.
BMC Genomics (Impact Factor: 4.04). 05/2012; 13:170. DOI: 10.1186/1471-2164-13-170
Source: PubMed

ABSTRACT Linking phenotypes to high-throughput molecular biology information generated by ~omics technologies allows revealing cellular mechanisms underlying an organism's phenotype. ~Omics datasets are often very large and noisy with many features (e.g., genes, metabolite abundances). Thus, associating phenotypes to ~omics data requires an approach that is robust to noise and can handle large and diverse data sets.
We developed a web-tool PhenoLink ( that links phenotype to ~omics data sets using well-established as well new techniques. PhenoLink imputes missing values and preprocesses input data (i) to decrease inherent noise in the data and (ii) to counterbalance pitfalls of the Random Forest algorithm, on which feature (e.g., gene) selection is based. Preprocessed data is used in feature (e.g., gene) selection to identify relations to phenotypes. We applied PhenoLink to identify gene-phenotype relations based on the presence/absence of 2847 genes in 42 Lactobacillus plantarum strains and phenotypic measurements of these strains in several experimental conditions, including growth on sugars and nitrogen-dioxide production. Genes were ranked based on their importance (predictive value) to correctly predict the phenotype of a given strain. In addition to known gene to phenotype relations we also found novel relations.
PhenoLink is an easily accessible web-tool to facilitate identifying relations from large and often noisy phenotype and ~omics datasets. Visualization of links to phenotypes offered in PhenoLink allows prioritizing links, finding relations between features, finding relations between phenotypes, and identifying outliers in phenotype data. PhenoLink can be used to uncover phenotype links to a multitude of ~omics data, e.g., gene presence/absence (determined by e.g.: CGH or next-generation sequencing), gene expression (determined by e.g.: microarrays or RNA-seq), or metabolite abundance (determined by e.g.: GC-MS).

Download full-text


Available from: Roland Siezen, Aug 13, 2015
  • Source
    • "We argue that trait-based approaches should build on—not replace—taxonomy-based approaches. The information needed to properly characterize the co-occurrence of traits and trait trade-offs among microorganisms builds on taxonomic ranks, and there is certainly an incentive for more highthroughput surveys of phenotypic characteristics of microbial taxa (Bayjanov et al., 2012). Such approaches could mark the beginning of a deviation from classical phylum-based approaches in microbial BEF studies toward a classification based on functional performance and role in the environment. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In ecology, biodiversity-ecosystem functioning (BEF) research has seen a shift in perspective from taxonomy to function in the last two decades, with successful application of trait-based approaches. This shift offers opportunities for a deeper mechanistic understanding of the role of biodiversity in maintaining multiple ecosystem processes and services. In this paper, we highlight studies that have focused on BEF of microbial communities with an emphasis on integrating trait-based approaches to microbial ecology. In doing so, we explore some of the inherent challenges and opportunities of understanding BEF using microbial systems. For example, microbial biologists characterize communities using gene phylogenies that are often unable to resolve functional traits. Additionally, experimental designs of existing microbial BEF studies are often inadequate to unravel BEF relationships. We argue that combining eco-physiological studies with contemporary molecular tools in a trait-based framework can reinforce our ability to link microbial diversity to ecosystem processes. We conclude that such trait-based approaches are a promising framework to increase the understanding of microbial BEF relationships and thus generating systematic principles in microbial ecology and more generally ecology.
    Frontiers in Microbiology 05/2014; 5. DOI:10.3389/fmicb.2014.00251 · 3.94 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In the Life Sciences 'omics' data is increasingly generated by different high-throughput technologies. Often only the integration of these data allows uncovering biological insights that can be experimentally validated or mechanistically modelled, i.e. sophisticated computational approaches are required to extract the complex non-linear trends present in omics data. Classification techniques allow training a model based on variables (e.g. SNPs in genetic association studies) to separate different classes (e.g. healthy subjects versus patients). Random Forest (RF) is a versatile classification algorithm suited for the analysis of these large data sets. In the Life Sciences, RF is popular because RF classification models have a high-prediction accuracy and provide information on importance of variables for classification. For omics data, variables or conditional relations between variables are typically important for a subset of samples of the same class. For example: within a class of cancer patients certain SNP combinations may be important for a subset of patients that have a specific subtype of cancer, but not important for a different subset of patients. These conditional relationships can in principle be uncovered from the data with RF as these are implicitly taken into account by the algorithm during the creation of the classification model. This review details some of the to the best of our knowledge rarely or never used RF properties that allow maximizing the biological insights that can be extracted from complex omics data sets using RF.
    Briefings in Bioinformatics 07/2012; 14(3). DOI:10.1093/bib/bbs034 · 9.62 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Fermented foods and beverages are an integral part of the human diet globally. Understanding the microbial interactions within these fermenting ecosystems is required to deliver safe products with desirable consumer properties, and moreover, maintenance of these traditions. Effective tools are required for documentation of cultures in traditional and artisanal fermented products, for sensory quality and safety improvements, in some cases for starter culture design for commercialization and potentially for supporting sustainable food systems. Here we trace the developments of sequence-based molecular technologies for investigating the diversity and functionality of microbiota in traditional or indigenous fermented foods and beverages. The opportunities of phylobiomics, metagenomics and metatranscriptomics to enrich our knowledge of fermenting microbial ecosystems are presented.
    Current Opinion in Biotechnology 09/2012; 24(2). DOI:10.1016/j.copbio.2012.08.004 · 8.04 Impact Factor
Show more