Article

Automated multidimensional phenotypic profiling using large public microarray repositories.

Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA.
Proceedings of the National Academy of Sciences (Impact Factor: 9.81). 07/2009; 106(30):12323-8. DOI: 10.1073/pnas.0900883106
Source: PubMed

ABSTRACT Phenotypes are complex, and difficult to quantify in a high-throughput fashion. The lack of comprehensive phenotype data can prevent or distort genotype-phenotype mapping. Here, we describe "PhenoProfiler," a computational method that enables in silico phenotype profiling. Drawing on the principle that similar gene expression patterns are likely to be associated with similar phenotype patterns, PhenoProfiler supplements the missing quantitative phenotype information for a given microarray dataset based on other well-characterized microarray datasets. We applied our method to 587 human microarray datasets covering >14,000 samples, and confirmed that the predicted phenotype profiles are highly consistent with true phenotype descriptions. PhenoProfiler offers several unique capabilities: (i) automated, multidimensional phenotype profiling, facilitating the analysis and treatment design of complex diseases; (ii) the extrapolation of phenotype profiles beyond provided classes; and (iii) the detection of confounding phenotype factors that could otherwise bias biological inferences. Finally, because no direct comparisons are made between gene expression values from different datasets, the method can use the entire body of cross-platform microarray data. This work has produced a compendium of phenotype profiles for the National Center for Biotechnology Information GEO datasets, which can facilitate an unbiased understanding of the transcriptome-phenome mapping. The continued accumulation of microarray data will further increase the power of PhenoProfiler, by increasing the variety and the quality of phenotypes to be profiled.

Download full-text

Full-text

Available from: Wenyuan Li, Jul 01, 2015
0 Followers
 · 
128 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Calculation of voltages in a mine electrical distribution system using a power-flow program may fail to yield meaningful results if the iterative power-flow procedure fails to converge. A modified power-flow technique that converges for most of the mine electrical power-flow input data, even though the same input data do not yield a convergent solution with a traditional Newton power-flow algorithm, is presented. When no operable solution to the power-flow problem exists, the results tend to indicate the location of the modeling error that is responsible for nonconvergence. An extensive case study using data for a mine electrical power system is presented to demonstrate the robustness and error identification properties of the power-flow algorithm
    Neuron 01/1989; DOI:10.1109/IAS.1989.96842 · 15.98 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Nonnegative Matrix Factorization (NMF) is an unsupervised learning technique that has been applied successfully in several fields, including signal processing, face recognition and text mining. Recent applications of NMF in bioinformatics have demonstrated its ability to extract meaningful information from high-dimensional data such as gene expression microarrays. Developments in NMF theory and applications have resulted in a variety of algorithms and methods. However, most NMF implementations have been on commercial platforms, while those that are freely available typically require programming skills. This limits their use by the wider research community. Our objective is to provide the bioinformatics community with an open-source, easy-to-use and unified interface to standard NMF algorithms, as well as with a simple framework to help implement and test new NMF methods. For that purpose, we have developed a package for the R/BioConductor platform. The package ports public code to R, and is structured to enable users to easily modify and/or add algorithms. It includes a number of published NMF algorithms and initialization methods and facilitates the combination of these to produce new NMF strategies. Commonly used benchmark data and visualization methods are provided to help in the comparison and interpretation of the results. The NMF package helps realize the potential of Nonnegative Matrix Factorization, especially in bioinformatics, providing easy access to methods that have already yielded new insights in many applications. Documentation, source code and sample data are available from CRAN.
    BMC Bioinformatics 07/2010; 11:367. DOI:10.1186/1471-2105-11-367 · 2.67 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Elucidating the molecular mechanisms underlying quantitative neurocognitive phenotypes will further our understanding of the brain's structural and functional architecture and advance the diagnosis and treatment of the psychiatric disorders that these traits underlie. Although many neurocognitive traits are highly heritable, little progress has been made in identifying genetic variants unequivocally associated with these phenotypes. A major obstacle to such progress is the difficulty in identifying heritable neurocognitive measures that are precisely defined and systematically assessed and represent unambiguous mental constructs, yet are also amenable to the high-throughput phenotyping necessary to obtain adequate power for genetic association studies. In this perspective we compare the current status of genetic investigations of neurocognitive phenotypes to that of other categories of biomedically relevant traits and suggest strategies for genetically dissecting traits that may underlie disorders of brain and behavior.
    Neuron 10/2010; 68(2):218-30. DOI:10.1016/j.neuron.2010.10.007 · 15.98 Impact Factor