Automated multidimensional phenotypic profiling using large public microarray repositories.

Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA.
Proceedings of the National Academy of Sciences (Impact Factor: 9.81). 07/2009; 106(30):12323-8. DOI: 10.1073/pnas.0900883106
Source: PubMed

ABSTRACT Phenotypes are complex, and difficult to quantify in a high-throughput fashion. The lack of comprehensive phenotype data can prevent or distort genotype-phenotype mapping. Here, we describe "PhenoProfiler," a computational method that enables in silico phenotype profiling. Drawing on the principle that similar gene expression patterns are likely to be associated with similar phenotype patterns, PhenoProfiler supplements the missing quantitative phenotype information for a given microarray dataset based on other well-characterized microarray datasets. We applied our method to 587 human microarray datasets covering >14,000 samples, and confirmed that the predicted phenotype profiles are highly consistent with true phenotype descriptions. PhenoProfiler offers several unique capabilities: (i) automated, multidimensional phenotype profiling, facilitating the analysis and treatment design of complex diseases; (ii) the extrapolation of phenotype profiles beyond provided classes; and (iii) the detection of confounding phenotype factors that could otherwise bias biological inferences. Finally, because no direct comparisons are made between gene expression values from different datasets, the method can use the entire body of cross-platform microarray data. This work has produced a compendium of phenotype profiles for the National Center for Biotechnology Information GEO datasets, which can facilitate an unbiased understanding of the transcriptome-phenome mapping. The continued accumulation of microarray data will further increase the power of PhenoProfiler, by increasing the variety and the quality of phenotypes to be profiled.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Laboratory-based evolution and whole genome sequencing can link genotype and phenotype. We used evolution of acid resistance in exponential phase Escherichia coli to study resistance to a lethal stress. Iterative selection at pH 2.5 generated five populations that were resistant to low pH in early exponential phase. Genome sequencing revealed multiple mutations, but the only gene mutated in all strains was evgS, part of a two-component system that has already been implicated in acid resistance. All these mutations were in the cytoplasmic PAS domain of EvgS, and were shown to be solely responsible for the resistant phenotype, causing strong up-regulation at neutral pH of genes normally induced by low pH. Resistance to pH 2.5 in these strains did not require the transporter GadC, or the sigma factor RpoS. We found that EvgS-dependent constitutive acid resistance to pH 2.5 was retained in the absence of the regulators GadE or YdeO,. but was lost if the oxidoreductase YdeP was also absent. A deletion in the periplasmic domain of EvgS abolished the response to low pH, but not the activity of the constitutive mutants. On the basis of these results we propose a model for how EvgS may become activated by low pH.
    Molecular Microbiology 07/2014; DOI:10.1111/mmi.12704 · 5.03 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Calculation of voltages in a mine electrical distribution system using a power-flow program may fail to yield meaningful results if the iterative power-flow procedure fails to converge. A modified power-flow technique that converges for most of the mine electrical power-flow input data, even though the same input data do not yield a convergent solution with a traditional Newton power-flow algorithm, is presented. When no operable solution to the power-flow problem exists, the results tend to indicate the location of the modeling error that is responsible for nonconvergence. An extensive case study using data for a mine electrical power system is presented to demonstrate the robustness and error identification properties of the power-flow algorithm
    Neuron 01/1989; DOI:10.1109/IAS.1989.96842 · 15.98 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide binding assays can determine where individual transcription factors bind in the genome. However, these factors rarely bind chromatin alone, but instead frequently bind to cis-regulatory elements (CREs) together with other factors as protein complexes. Currently there are no integrative analytical approaches that can predict which complexes are formed on chromatin. Here, we describe a computational methodology to systematically capture protein complexes and infer their impact on gene expression. We applied our method to three human cell types, identified thousands of CREs, identified known or undescribed complexes recruited to these CREs, and determined the role of the complexes as activators or repressors. Importantly, we found that the predicted complexes have a higher number of physical interactions between their members than expected by chance. Our work provides a mechanism for developing hypotheses about gene regulation via binding partners, and deciphering the interplay between combinatorial binding and gene expression.
    Genome Research 04/2013; DOI:10.1101/gr.149419.112 · 13.85 Impact Factor

Full-text (2 Sources)

Available from
May 20, 2014