Disease signatures are robust across tissues and experiments.

Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA.
Molecular Systems Biology (Impact Factor: 11.34). 01/2009; 5:307. DOI:10.1038/msb.2009.66
Source: PubMed

ABSTRACT Meta-analyses combining gene expression microarray experiments offer new insights into the molecular pathophysiology of disease not evident from individual experiments. Although the established technical reproducibility of microarrays serves as a basis for meta-analysis, pathophysiological reproducibility across experiments is not well established. In this study, we carried out a large-scale analysis of disease-associated experiments obtained from NCBI GEO, and evaluated their concordance across a broad range of diseases and tissue types. On evaluating 429 experiments, representing 238 diseases and 122 tissues from 8435 microarrays, we find evidence for a general, pathophysiological concordance between experiments measuring the same disease condition. Furthermore, we find that the molecular signature of disease across tissues is overall more prominent than the signature of tissue expression across diseases. The results offer new insight into the quality of public microarray data using pathophysiological metrics, and support new directions in meta-analysis that include characterization of the commonalities of disease irrespective of tissue, as well as the creation of multi-tissue systems models of disease pathology using public data.

0 0
1 Bookmark
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: We utilized abundant transcriptomic data for the primary classes of brain cancers to study the feasibility of separating all of these diseases simultaneously based on molecular data alone. These signatures were based on a new method reported herein - Identification of Structured Signatures and Classifiers (ISSAC) - that resulted in a brain cancer marker panel of 44 unique genes. Many of these genes have established relevance to the brain cancers examined herein, with others having known roles in cancer biology. Analyses on large-scale data from multiple sources must deal with significant challenges associated with heterogeneity between different published studies, for it was observed that the variation among individual studies often had a larger effect on the transcriptome than did phenotype differences, as is typical. For this reason, we restricted ourselves to studying only cases where we had at least two independent studies performed for each phenotype, and also reprocessed all the raw data from the studies using a unified pre-processing pipeline. We found that learning signatures across multiple datasets greatly enhanced reproducibility and accuracy in predictive performance on truly independent validation sets, even when keeping the size of the training set the same. This was most likely due to the meta-signature encompassing more of the heterogeneity across different sources and conditions, while amplifying signal from the repeated global characteristics of the phenotype. When molecular signatures of brain cancers were constructed from all currently available microarray data, 90% phenotype prediction accuracy, or the accuracy of identifying a particular brain cancer from the background of all phenotypes, was found. Looking forward, we discuss our approach in the context of the eventual development of organ-specific molecular signatures from peripheral fluids such as the blood.
    PLoS Computational Biology 07/2013; 9(7):e1003148. · 4.87 Impact Factor
  • [show abstract] [hide abstract]
    ABSTRACT: Genome-wide association studies have discovered many genetic loci associated with disease traits, but the functional molecular basis of these associations is often unresolved. Genome-wide regulatory and gene expression profiles measured across individuals and diseases reflect downstream effects of genetic variation and may allow for functional assessment of disease-associated loci. Here, we present a unique approach for systematic integration of genetic disease associations, transcription factor binding among individuals, and gene expression data to assess the functional consequences of variants associated with hundreds of human diseases. In an analysis of genome-wide binding profiles of NFκB, we find that disease-associated SNPs are enriched in NFκB binding regions overall, and specifically for inflammatory-mediated diseases, such as asthma, rheumatoid arthritis, and coronary artery disease. Using genome-wide variation in transcription factor-binding data, we find that NFκB binding is often correlated with disease-associated variants in a genotype-specific and allele-specific manner. Furthermore, we show that this binding variation is often related to expression of nearby genes, which are also found to have altered expression in independent profiling of the variant-associated disease condition. Thus, using this integrative approach, we provide a unique means to assign putative function to many disease-associated SNPs.
    Proceedings of the National Academy of Sciences 05/2013; · 9.74 Impact Factor
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Transcription factor cross-repression is an important concept in cellular differentiation. A bistable toggle switch constitutes a molecular mechanism that determines cellular commitment and provides stability to transcriptional programs of binary cell fate choices. Experiments support that perturbations of these toggle switches can interconvert these binary cell fate choices, suggesting potential reprogramming strategies. However, more complex types of cellular transitions could involve perturbations of combinations of different types of multistable motifs. Here we introduce a method that generalizes the concept of transcription factor cross-repression to systematically predict sets of genes, whose perturbations induce cellular transitions between any given pair of cell types. Furthermore, to our knowledge, this is the first method that systematically makes these predictions without prior knowledge of potential candidate genes and pathways involved, providing guidance on systems where little is known. Given the increasing interest of cellular reprogramming in medicine and basic research, our method represents a useful computational methodology to assist researchers in the field in designing experimental strategies.
    Stem Cells 07/2013; · 7.70 Impact Factor

Full-text (2 Sources)

Available from
Jan 20, 2014