PhenoLink - a web-tool for linking phenotype to ~omics data for bacteria: Application to gene-trait matching for Lactobacillus plantarum strains

Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, PO Box 9101, Nijmegen, The Netherlands.
BMC Genomics (Impact Factor: 3.99). 05/2012; 13(1):170. DOI: 10.1186/1471-2164-13-170
Source: PubMed


Linking phenotypes to high-throughput molecular biology information generated by ~omics technologies allows revealing cellular mechanisms underlying an organism's phenotype. ~Omics datasets are often very large and noisy with many features (e.g., genes, metabolite abundances). Thus, associating phenotypes to ~omics data requires an approach that is robust to noise and can handle large and diverse data sets.
We developed a web-tool PhenoLink ( that links phenotype to ~omics data sets using well-established as well new techniques. PhenoLink imputes missing values and preprocesses input data (i) to decrease inherent noise in the data and (ii) to counterbalance pitfalls of the Random Forest algorithm, on which feature (e.g., gene) selection is based. Preprocessed data is used in feature (e.g., gene) selection to identify relations to phenotypes. We applied PhenoLink to identify gene-phenotype relations based on the presence/absence of 2847 genes in 42 Lactobacillus plantarum strains and phenotypic measurements of these strains in several experimental conditions, including growth on sugars and nitrogen-dioxide production. Genes were ranked based on their importance (predictive value) to correctly predict the phenotype of a given strain. In addition to known gene to phenotype relations we also found novel relations.
PhenoLink is an easily accessible web-tool to facilitate identifying relations from large and often noisy phenotype and ~omics datasets. Visualization of links to phenotypes offered in PhenoLink allows prioritizing links, finding relations between features, finding relations between phenotypes, and identifying outliers in phenotype data. PhenoLink can be used to uncover phenotype links to a multitude of ~omics data, e.g., gene presence/absence (determined by e.g.: CGH or next-generation sequencing), gene expression (determined by e.g.: microarrays or RNA-seq), or metabolite abundance (determined by e.g.: GC-MS).

Download full-text


Available from: Roland Siezen
  • Source
    • "We argue that trait-based approaches should build on—not replace—taxonomy-based approaches. The information needed to properly characterize the co-occurrence of traits and trait trade-offs among microorganisms builds on taxonomic ranks, and there is certainly an incentive for more highthroughput surveys of phenotypic characteristics of microbial taxa (Bayjanov et al., 2012). Such approaches could mark the beginning of a deviation from classical phylum-based approaches in microbial BEF studies toward a classification based on functional performance and role in the environment. "
    [Show abstract] [Hide abstract]
    ABSTRACT: In ecology, biodiversity-ecosystem functioning (BEF) research has seen a shift in perspective from taxonomy to function in the last two decades, with successful application of trait-based approaches. This shift offers opportunities for a deeper mechanistic understanding of the role of biodiversity in maintaining multiple ecosystem processes and services. In this paper, we highlight studies that have focused on BEF of microbial communities with an emphasis on integrating trait-based approaches to microbial ecology. In doing so, we explore some of the inherent challenges and opportunities of understanding BEF using microbial systems. For example, microbial biologists characterize communities using gene phylogenies that are often unable to resolve functional traits. Additionally, experimental designs of existing microbial BEF studies are often inadequate to unravel BEF relationships. We argue that combining eco-physiological studies with contemporary molecular tools in a trait-based framework can reinforce our ability to link microbial diversity to ecosystem processes. We conclude that such trait-based approaches are a promising framework to increase the understanding of microbial BEF relationships and thus generating systematic principles in microbial ecology and more generally ecology.
    Full-text · Article · May 2014 · Frontiers in Microbiology
  • Source
    • "Integrative genotype-phenotype matching would facilitate identifying genetic markers relevant for the manifestation of a phenotype. We therefore used an iterative gene selection procedure coined PhenoLink [22] to more accurately determine gene to phenotype relations of 38 L. lactis strains from 3 different subspecies: ssp. lactis, ssp. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Lactococcus lactis is used in dairy food fermentation and for the efficient production of industrially relevant enzymes. The genome content and different phenotypes have been determined for multiple L. lactis strains in order to understand intra-species genotype and phenotype diversity and annotate gene functions. In this study, we identified relations between gene presence and a collection of 207 phenotypes across 38 L. lactis strains of dairy and plant origin. Gene occurrence and phenotype data were used in an iterative gene selection procedure, based on the Random Forest algorithm, to identify genotype-phenotype relations. Results A total of 1388 gene-phenotype relations were found, of which some confirmed known gene-phenotype relations, such as the importance of arabinose utilization genes only for strains of plant origin. We also identified a gene cluster related to growth on melibiose, a plant disaccharide; this cluster is present only in melibiose-positive strains and can be used as a genetic marker in trait improvement. Additionally, several novel gene-phenotype relations were uncovered, for instance, genes related to arsenite resistance or arginine metabolism. Conclusions Our results indicate that genotype-phenotype matching by integrating large data sets provides the possibility to identify gene-phenotype relations, possibly improve gene function annotation and identified relations can be used for screening bacterial culture collections for desired phenotypes. In addition to all gene-phenotype relations, we also provide coherent phenotype data for 38 Lactococcus strains assessed in 207 different phenotyping experiments, which to our knowledge is the largest to date for the Lactococcus lactis species.
    Full-text · Article · Mar 2013 · BMC Microbiology
  • Source
    • "Random forest analyses were performed to find the signature gene sets used to interrogate whether donors within this study could be divided into distinct groups based on their gene expression profiles. Irrelevant genes were removed from the signature set using the random forest-based local importance measure as described in PhenoLink [9]. A total of 21798 genes were removed in the initial step and the classification or out of bag (OOB) error decreased substantially from 70% to 22%. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Background Tuberculosis (TB) continues to cause a high toll of disease and death among children worldwide. The diagnosis of childhood TB is challenged by the paucibacillary nature of the disease and the difficulties in obtaining specimens. Whereas scientific and clinical research efforts to develop novel diagnostic tools have focused on TB in adults, childhood TB has been relatively neglected. Blood transcriptional profiling has improved our understanding of disease pathogenesis of adult TB and may offer future leads for diagnosis and treatment. No studies applying gene expression profiling of children with TB have been published so far. Results We identified a 116-gene signature set that showed an average prediction error of 11% for TB vs. latent TB infection (LTBI) and for TB vs. LTBI vs. healthy controls (HC) in our dataset. A minimal gene set of only 9 genes showed the same prediction error of 11% for TB vs. LTBI in our dataset. Furthermore, this minimal set showed a significant discriminatory value for TB vs. LTBI for all previously published adult studies using whole blood gene expression, with average prediction errors between 17% and 23%. In order to identify a robust representative gene set that would perform well in populations of different genetic backgrounds, we selected ten genes that were highly discriminative between TB, LTBI and HC in all literature datasets as well as in our dataset. Functional annotation of these genes highlights a possible role for genes involved in calcium signaling and calcium metabolism as biomarkers for active TB. These ten genes were validated by quantitative real-time polymerase chain reaction in an additional cohort of 54 Warao Amerindian children with LTBI, HC and non-TB pneumonia. Decision tree analysis indicated that five of the ten genes were sufficient to classify 78% of the TB cases correctly with no LTBI subjects wrongly classified as TB (100% specificity). Conclusions Our data justify the further exploration of our signature set as biomarkers for potential childhood TB diagnosis. We show that, as the identification of different biomarkers in ethnically distinct cohorts is apparent, it is important to cross-validate newly identified markers in all available cohorts.
    Full-text · Article · Feb 2013 · BMC Genomics
Show more