Article

Automated multidimensional phenotypic profiling using large public microarray repositories.

Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA.
Proceedings of the National Academy of Sciences (Impact Factor: 9.81). 07/2009; 106(30):12323-8. DOI: 10.1073/pnas.0900883106
Source: PubMed

ABSTRACT Phenotypes are complex, and difficult to quantify in a high-throughput fashion. The lack of comprehensive phenotype data can prevent or distort genotype-phenotype mapping. Here, we describe "PhenoProfiler," a computational method that enables in silico phenotype profiling. Drawing on the principle that similar gene expression patterns are likely to be associated with similar phenotype patterns, PhenoProfiler supplements the missing quantitative phenotype information for a given microarray dataset based on other well-characterized microarray datasets. We applied our method to 587 human microarray datasets covering >14,000 samples, and confirmed that the predicted phenotype profiles are highly consistent with true phenotype descriptions. PhenoProfiler offers several unique capabilities: (i) automated, multidimensional phenotype profiling, facilitating the analysis and treatment design of complex diseases; (ii) the extrapolation of phenotype profiles beyond provided classes; and (iii) the detection of confounding phenotype factors that could otherwise bias biological inferences. Finally, because no direct comparisons are made between gene expression values from different datasets, the method can use the entire body of cross-platform microarray data. This work has produced a compendium of phenotype profiles for the National Center for Biotechnology Information GEO datasets, which can facilitate an unbiased understanding of the transcriptome-phenome mapping. The continued accumulation of microarray data will further increase the power of PhenoProfiler, by increasing the variety and the quality of phenotypes to be profiled.

0 Bookmarks
 · 
85 Views
  • [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide binding assays can determine where individual transcription factors bind in the genome. However, these factors rarely bind chromatin alone, but instead frequently bind to cis-regulatory elements (CREs) together with other factors as protein complexes. Currently there are no integrative analytical approaches that can predict which complexes are formed on chromatin. Here, we describe a computational methodology to systematically capture protein complexes and infer their impact on gene expression. We applied our method to three human cell types, identified thousands of CREs, identified known or undescribed complexes recruited to these CREs, and determined the role of the complexes as activators or repressors. Importantly, we found that the predicted complexes have a higher number of physical interactions between their members than expected by chance. Our work provides a mechanism for developing hypotheses about gene regulation via binding partners, and deciphering the interplay between combinatorial binding and gene expression.
    Genome Research 04/2013; · 14.40 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The brain has a large capacity for automatic simultaneous processing and integration of sensory information. Combining information from different sensory modalities facilitates our ability to detect, discriminate, and recognize sensory stimuli, and learning is often optimal in a multisensory environment. Currently used multisensory stimulation methods in stroke rehabilitation include motor imagery, action observation, training with a mirror or in a virtual environment, and various kinds of music therapy. Non-invasive brain stimulation has showed promising preliminary results in aphasia and neglect. Patient heterogeneity and the interaction of age, gender, genes, and environment are discussed. Randomized controlled longitudinal trials starting earlier post-stroke are needed. The advance in brain network science and neuroimaging enabling longitudinal studies of structural and functional networks are likely to have an important impact on patient selection for specific interventions in future stroke rehabilitation. It is proposed that we should pay more attention to age, gender, and laterality in clinical studies.
    Frontiers in Human Neuroscience 01/2012; 6:60. · 2.91 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genomic datasets generated by new technologies are increasingly prevalent in disparate areas of biological research. While many studies have sought to characterize relationships among genomic features, commensurate efforts to characterize relationships among biological samples have been less common. Consequently, the full extent of sample variation in genomic studies is often under-appreciated, complicating downstream analytical tasks such as gene co-expression network analysis. Here we demonstrate the use of network methods for characterizing sample relationships in microarray data generated from human brain tissue. We describe an approach for identifying outlying samples that does not depend on the choice or use of clustering algorithms. We introduce a battery of measures for quantifying the consistency and integrity of sample relationships, which can be compared across disparate studies, technology platforms, and biological systems. Among these measures, we provide evidence that the correlation between the connectivity and the clustering coefficient (two important network concepts) is a sensitive indicator of homogeneity among biological samples. We also show that this measure, which we refer to as cor(K,C), can distinguish biologically meaningful relationships among subgroups of samples. Specifically, we find that cor(K,C) reveals the profound effect of Huntington's disease on samples from the caudate nucleus relative to other brain regions. Furthermore, we find that this effect is concentrated in specific modules of genes that are naturally co-expressed in human caudate nucleus, highlighting a new strategy for exploring the effects of disease on sets of genes. These results underscore the importance of systematically exploring sample relationships in large genomic datasets before seeking to analyze genomic feature activity. We introduce a standardized platform for this purpose using freely available R software that has been designed to enable iterative and interactive exploration of sample networks.
    BMC Systems Biology 06/2012; 6:63. · 2.98 Impact Factor

Full-text (2 Sources)

View
26 Downloads
Available from
May 20, 2014