Meta-Analysis of Microarray Studies Reveals a Novel Hematopoietic Progenitor Cell Signature and Demonstrates Feasibility of Inter-Platform Data Integration

Albert Einstein College of Medicine, Bronx, New York, United States of America.
PLoS ONE (Impact Factor: 3.23). 02/2008; 3(8):e2965. DOI: 10.1371/journal.pone.0002965
Source: PubMed


Microarray-based studies of global gene expression (GE) have resulted in a large amount of data that can be mined for further insights into disease and physiology. Meta-analysis of these data is hampered by technical limitations due to many different platforms, gene annotations and probes used in different studies. We tested the feasibility of conducting a meta-analysis of GE studies to determine a transcriptional signature of hematopoietic progenitor and stem cells. Data from studies that used normal bone marrow-derived hematopoietic progenitors was integrated using both RefSeq and UniGene identifiers. We observed that in spite of variability introduced by experimental conditions and different microarray platforms, our meta-analytical approach can distinguish biologically distinct normal tissues by clustering them based on their cell of origin. When studied in terms of disease states, GE studies of leukemias and myelodysplasia progenitors tend to cluster with normal progenitors and remain distinct from other normal tissues, further validating the discriminatory power of this meta-analysis. Furthermore, analysis of 57 normal hematopoietic stem and progenitor cell GE samples was used to determine a gene expression signature characteristic of these cells. Genes that were most uniformly expressed in progenitors and at the same time differentially expressed when compared to other normal tissues were found to be involved in important biological processes such as cell cycle regulation and hematopoiesis. Validation studies using a different microarray platform demonstrated the enrichment of several genes such as SMARCE, Septin 6 and others not previously implicated in hematopoiesis. Most interestingly, alpha-integrin, the only common stemness gene discovered in a recent comparative murine analysis (Science 302(5644):393) was also enriched in our dataset, demonstrating the usefulness of this analytical approach.

Download full-text


Available from: Tushar Bhagat,
  • Source
    • "MK5 seems to be ubiquitously expressed because MK5 transcripts and proteins have been detected in all cell types and tissues examined. MK5 seems to be most abundantly expressed in heart, brain, and hematopoietic progenitors (New et al. , 1998 ; Ni et al. , 1998 ; Sohal et al. , 2008 ; Gerits et al. , 2009 ). The primary sequence shows that the protein is evolutionarily highly conserved with 87 – 98% amino acid identity between hMK5 and fish MK5, except for lamprey ( Petromyzon marinus ) MK5, which shares only 146/188 (78%) identical residues with hMK5 in its N-terminal part. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract Mitogen-activated protein kinase (MAPK) pathways are important signal transduction pathways that control pivotal cellular processes including proliferation, differentiation, survival, apoptosis, gene regulation and motility. MAPK pathways consist of a relay of consecutive phosphorylation events exerted by MAPK kinase kinases, MAPK kinases and MAPKs. Conventional MAPKs are characterized by a conserved Thr-X-Tyr motif in the activation loop of the kinase domain, while atypical MAPKs lack this motif and do not seem to be organized into the classical three-tiered kinase cascade. One functional group of conventional and atypical MAPK substrates consists of protein kinases known as MAPK-activated protein kinases. Eleven mammalian MAPK-activated protein kinases have been identified and they are divided into five subgroups: the ribosomal-S6-kinases RSK1-4, the MAPK-interacting kinases MNK1 and 2, the mitogen-and stress-activated kinases MSK1 and 2, the MAPK-activated protein kinases MK2 and 3, and the MAPK-activated protein kinase MK5 (also referred to as PRAK). MK5/PRAK is the only MAPK-activated protein kinase that is substrate for both conventional and atypical MAPK, while all other MAPKAPKs are exclusively phosphorylated by conventional MAPKs. This review focuses on the structure, activation, substrates, functions and possible implications of MK5/PRAK in malignant and non-malignant diseases.
    Biological Chemistry 05/2013; 394(9). DOI:10.1515/hsz-2013-0149 · 3.27 Impact Factor
  • Source
    • "While feasibility of integration of gene expression profile data, obtained from different experimental platforms or investigators, is highly desirable to build transcriptome maps representing all information available for a certain biological condition, the occurrence of systematic errors associated with each experimental situation requires advanced methods of inter-sample data normalization, such as the widely accepted quantile normalization [25]. However, this method may cause loss of data due to the removal of all genes whose expression values are missing for any dataset in order to obtain a fully filled data matrix, representing each sample as a column and the values for each gene as a row (for an example of this filtering see [26]). Alternatively, some Authors retain all data values in quantile normalization by placing missing values at the end of each sorted column [27]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Several tools have been developed to perform global gene expression profile data analysis, to search for specific chromosomal regions whose features meet defined criteria as well as to study neighbouring gene expression. However, most of these tools are tailored for a specific use in a particular context (e.g. they are species-specific, or limited to a particular data format) and they typically accept only gene lists as input. TRAM (Transcriptome Mapper) is a new general tool that allows the simple generation and analysis of quantitative transcriptome maps, starting from any source listing gene expression values for a given gene set (e.g. expression microarrays), implemented as a relational database. It includes a parser able to assign univocal and updated gene symbols to gene identifiers from different data sources. Moreover, TRAM is able to perform intra-sample and inter-sample data normalization, including an original variant of quantile normalization (scaled quantile), useful to normalize data from platforms with highly different numbers of investigated genes. When in 'Map' mode, the software generates a quantitative representation of the transcriptome of a sample (or of a pool of samples) and identifies if segments of defined lengths are over/under-expressed compared to the desired threshold. When in 'Cluster' mode, the software searches for a set of over/under-expressed consecutive genes. Statistical significance for all results is calculated with respect to genes localized on the same chromosome or to all genome genes. Transcriptome maps, showing differential expression between two sample groups, relative to two different biological conditions, may be easily generated. We present the results of a biological model test, based on a meta-analysis comparison between a sample pool of human CD34+ hematopoietic progenitor cells and a sample pool of megakaryocytic cells. Biologically relevant chromosomal segments and gene clusters with differential expression during the differentiation toward megakaryocyte were identified. TRAM is designed to create, and statistically analyze, quantitative transcriptome maps, based on gene expression data from multiple sources. The release includes FileMaker Pro database management runtime application and it is freely available at, along with preconfigured implementations for mapping of human, mouse and zebrafish transcriptomes.
    BMC Genomics 02/2011; 12(1):121. DOI:10.1186/1471-2164-12-121 · 3.99 Impact Factor
  • Source
    • "Over the past few years, a number of smaller scale efforts have attempted to define the specialised gene expression profiles of cells within the hemopoietic lineages in various states of differentiation and activation. Amongst these efforts, there have been several studies of isolated progenitor cells, aiming to identify the genes associated with "stemness", the capacity for self-renewal, to find additional markers that would enable isolation of these rare cells in high yield and purity for transplantation or in vitro regeneration [12]. Others have compared profiles of cells of the innate and acquired immune system in various states of activation or differentiation [13]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Very large microarray datasets showing gene expression across multiple tissues and cell populations provide a window on the transcriptional networks that underpin the differences in functional activity between biological systems. Clusters of co-expressed genes provide lineage markers, candidate regulators of cell function and, by applying the principle of guilt by association, candidate functions for genes of currently unknown function. We have analysed a dataset comprising pure cell populations from hemopoietic and non-hemopoietic cell types ( Using a novel network visualisation and clustering approach, we demonstrate that it is possible to identify very tight expression signatures associated specifically with embryonic stem cells, mesenchymal cells and hematopoietic lineages. Selected examples validate the prediction that gene function can be inferred by co-expression. One expression cluster was enriched in phagocytes, which, alongside endosome-lysosome constituents, contains genes that may make up a 'pathway' for phagocyte differentiation. Promoters of these genes are enriched for binding sites for the ETS/PU.1 and MITF families. Another cluster was associated with the production of a specific extracellular matrix, with high levels of gene expression shared by cells of mesenchymal origin (fibroblasts, adipocytes, osteoblasts and myoblasts). We discuss the limitations placed upon such data by the presence of alternative promoters with distinct tissue specificity within many protein-coding genes.
    Genomics 03/2010; 95(6):328-38. DOI:10.1016/j.ygeno.2010.03.002 · 2.28 Impact Factor
Show more