Disease signatures are robust across tissues and experiments

Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA.
Molecular Systems Biology (Impact Factor: 14.1). 09/2009; 5:307. DOI: 10.1038/msb.2009.66
Source: PubMed

ABSTRACT Meta-analyses combining gene expression microarray experiments offer new insights into the molecular pathophysiology of disease not evident from individual experiments. Although the established technical reproducibility of microarrays serves as a basis for meta-analysis, pathophysiological reproducibility across experiments is not well established. In this study, we carried out a large-scale analysis of disease-associated experiments obtained from NCBI GEO, and evaluated their concordance across a broad range of diseases and tissue types. On evaluating 429 experiments, representing 238 diseases and 122 tissues from 8435 microarrays, we find evidence for a general, pathophysiological concordance between experiments measuring the same disease condition. Furthermore, we find that the molecular signature of disease across tissues is overall more prominent than the signature of tissue expression across diseases. The results offer new insight into the quality of public microarray data using pathophysiological metrics, and support new directions in meta-analysis that include characterization of the commonalities of disease irrespective of tissue, as well as the creation of multi-tissue systems models of disease pathology using public data.

Download full-text


Available from: Tarangini Deshpande, Jan 20, 2014
1 Follower
  • Source
    • "According to our results, more TS genes are repressed (although most TS genes are already lowly expressed) while more HK genes are activated (especially for those that are highly expressed) during the tumorigenic process. Previous study suggested that molecular signature of disease across tissues is overall more prominent than the signature of tissue expression across diseases [33]. During cancer progression, specialization in cancerous tissues dropped due to a decrease in expression of genes that are highly specific to the normal organ [21]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Immortality and tumorigenicity are two distinct characteristics of cancers. Immortalization has been suggested to precede tumorigenesis. To understand the molecular mechanisms of tumorigenicity and cancer progression in mammary epithelium, we established a tumorigenic cell model by means of heavy-ion radiation of an immortal cell model, which was created by overexpressing the human telomerase reverse transcriptase (hTERT) in normal human mammary epithelial cells. We examined the expression profile of this tumorigenic cell line (T_hMEC) using the hTERT-overexpressing immortal cell line (I_hMEC) as a control. In-depth RNA-seq data was generated by using the next-generation sequencing (NGS) platform (Life Technologies SOLiD3). We found that house-keeping (HK) and tissue-specific (TS) genes were differentially regulated during the tumorigenic process. HK genes tended to be activated while TS genes tended to be repressed. In addition, the HK genes and TS genes tended to contribute differentially to the variation of gene expression at different RPKM (gene expression in reads per exon kilobase per million mapped sequence reads) levels. Based on transcriptome analysis of the two cell lines, we defined 7053 differentially-expressed genes (DEGs) between immortality and tumorigenicity. Differential expression of 20 manually-selected genes was further validated using qRT-PCR. Our observations may help to further our understanding of cellular mechanism(s) in the transition from immortalization to tumorigenesis.
    Genomics Proteomics & Bioinformatics 12/2012; 10(6):326-35. DOI:10.1016/j.gpb.2012.11.001
  • Source
    • "Taking into account the possibility that a large fraction of genes may be differentially expressed and up-and down-regulated genes may be asymmetric in a disease, several normalization algorithms have been developed recently (Calza et al., 2008; Ni et al., 2008; Dudley et al., 2009; Wu and Aryee, 2010). For example, based on the assumption that a certain fraction of genes have stable expressions across samples regardless of the sample states, the LVS (leastvariant set) algorithm uses a non-linear model to fit a pre-selected set of genes with small variation across all arrays from individual array against those from a reference array (Calza et al., 2008). "
    [Show abstract] [Hide abstract]
    ABSTRACT: When using microarray data for studying a complex disease such as cancer, it is a common practice to normalize data to force all arrays to have the same distribution of probe intensities regardless of the biological groups of samples. The assumption underlying such normalization is that in a disease the majority of genes are not differentially expressed genes (DE genes) and the numbers of up- and down-regulated genes are roughly equal. However, accumulated evidences suggest gene expressions could be widely altered in cancer, so we need to evaluate the sensitivities of biological discoveries to violation of the normalization assumption. Here, we analyzed 7 large Affymetrix datasets of pair-matched normal and cancer samples for cancers collected in the NCBI GEO database. We showed that in 6 of these 7 datasets, the medians of perfect match (PM) probe intensities increased in cancer state and the increases were significant in three datasets, suggesting the assumption that all arrays have the same median probe intensities regardless of the biological groups of samples might be misleading. Then, we evaluated the effects of three currently most widely used normalization algorithms (RMA, MAS5.0 and dChip) on the selection of DE genes by comparing them with LVS which relies less on the above-mentioned assumption. The results showed using RMA, MAS5.0 and dChip may produce lots of false results of down-regulated DE genes while missing many up-regulated DE genes. At least for cancer study, normalizing all arrays to have the same distribution of probe intensities regardless of the biological groups of samples might be misleading. Thus, most current normalizations based on unreliable assumptions may distort biological differences between normal and cancer samples. The LVS algorithm might perform relatively well due to that it relies less on the above-mentioned assumption. Also, our results indicate that genes may be widely up-regulated in most human cancer.
    Computational biology and chemistry 06/2011; 35(3):126-30. DOI:10.1016/j.compbiolchem.2011.04.006 · 1.60 Impact Factor
  • Source
    • "gov/geo/). Although biomedical researchers typically design microarray experiments to explore specific biological contexts, metaanalyses that integrate data from multiple experiments have the potential to reveal relationships that are not accessible through any individual dataset [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]. One critical question in the postgenomic era is the identification of sets of functionally related genes that correspond to a particular biological process. "
    [Show abstract] [Hide abstract]
    ABSTRACT: As public microarray repositories rapidly accumulate gene expression data, these resources contain increasingly valuable information about cellular processes in human biology. This presents a unique opportunity for intelligent data mining methods to extract information about the transcriptional modules underlying these biological processes. Modeling cellular gene expression as a combination of functional modules, we use independent component analysis (ICA) to derive 423 fundamental components of human biology from a 9395-array compendium of heterogeneous expression data. Annotation using the Gene Ontology (GO) suggests that while some of these components represent known biological modules, others may describe biology not well characterized by existing manually-curated ontologies. In order to understand the biological functions represented by these modules, we investigate the mechanism of the preclinical anti-cancer drug parthenolide (PTL) by analyzing the differential expression of our fundamental components. Our method correctly identifies known pathways and predicts that N-glycan biosynthesis and T-cell receptor signaling may contribute to PTL response. The fundamental gene modules we describe have the potential to provide pathway-level insight into new gene expression datasets.
    Journal of Biomedical Informatics 12/2010; 43(6):932-44. DOI:10.1016/j.jbi.2010.07.001 · 2.48 Impact Factor
Show more