Content-based microarray search using differential expression profiles.

Department of Bioengineering, Stanford University School of Medicine, CA, USA.
BMC Bioinformatics (Impact Factor: 3.02). 01/2010; 11:603. DOI: 10.1186/1471-2105-11-603
Source: DBLP

ABSTRACT With the expansion of public repositories such as the Gene Expression Omnibus (GEO), we are rapidly cataloging cellular transcriptional responses to diverse experimental conditions. Methods that query these repositories based on gene expression content, rather than textual annotations, may enable more effective experiment retrieval as well as the discovery of novel associations between drugs, diseases, and other perturbations.
We develop methods to retrieve gene expression experiments that differentially express the same transcriptional programs as a query experiment. Avoiding thresholds, we generate differential expression profiles that include a score for each gene measured in an experiment. We use existing and novel dimension reduction and correlation measures to rank relevant experiments in an entirely data-driven manner, allowing emergent features of the data to drive the results. A combination of matrix decomposition and p-weighted Pearson correlation proves the most suitable for comparing differential expression profiles. We apply this method to index all GEO DataSets, and demonstrate the utility of our approach by identifying pathways and conditions relevant to transcription factors Nanog and FoxO3.
Content-based gene expression search generates relevant hypotheses for biological inquiry. Experiments across platforms, tissue types, and protocols inform the analysis of new datasets.

1 Bookmark
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide measurement of transcript levels is an ubiquitous tool in biomedical research. As experimental data continues to be deposited in public databases, it is becoming important to develop search engines that enable the retrieval of relevant studies given a query study. While retrieval systems based on meta-data already exist, data-driven approaches that retrieve studies based on similarities in the expression data itself have a greater potential of uncovering novel biological insights. We propose an information retrieval method based on differential expression. Our method deals with arbitrary experimental designs and performs competitively with alternative approaches, while making the search results interpretable in terms of differential expression patterns. We show that our model yields meaningful connections between biological conditions from different studies. Finally, we validate a previously unknown connection between malignant pleural mesothelioma and SIM2s suggested by our method, via real-time polymerase chain reaction in an independent set of mesothelioma samples. Supplementary data and source code are available from
    Bioinformatics 11/2011; 28(2):246-53. · 5.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We introduce ProfileChaser, a web server that allows for querying the Gene Expression Omnibus based on genome-wide patterns of differential expression. Using a novel, content-based approach, ProfileChaser retrieves expression profiles that match the differentially regulated transcriptional programs in a user-supplied experiment. This analysis identifies statistical links to similar expression experiments from the vast array of publicly available data on diseases, drugs, phenotypes and other experimental conditions. Supplementary data are available at Bioinformatics online.
    Bioinformatics 10/2011; 27(23):3317-8. · 5.47 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: Biological entities do not perform in isolation, and often, it is the nature and degree of interactions among biological entities which determines the final outcome. Hence, experimental data on any single biological entity will be of limited value when considered in isolation. To address this, we propose that augmenting individual entity data with the literature - which includes all direct and indirect interactions in which the entity is involved - will not only better define the entity's own significance but uncover relationships with novel biological entities. To test this notion, we developed a comprehensive text mining methodology by focusing on one class of molecular entities, i.e. transcription factors, in one particular disease, colorectal cancer (CRC), with a goal to discover new targets. METHODS: We started the literature mining with a set of 39 molecular entities + 6 colorectal cancer terms (the bait list) that are known to be associated with CRC using our previously developed text mining tool, BioMAP. Further, we developed new algorithms for use in conjunction with MetacoreTM (GeneGo, Inc.) to ascertain the significance of these literature mined entities in CRC. RESULTS: The small bait list, when augmented with literature mined data, identified a large number of biological entities associated with CRC. The relative importance of these transcription factors and their associated modules was identified using functional and topological features. Further re-validation of these highly ranked transcription factors using literature strengthened our findings. Some of the novel transcriptions factors identified we identified which can be studied further are: SLUG, RUNX1, IRF1, HIF1A, ATF-2, ABL1, ELK-1 and GATA-1. Our findings identified functional modules in known pathways of CRC, like the Beta-catenin/development pathway, as well as functional modules in new potential pathways in CRC, such as DNA damage. CONCLUSIONS: Our methodology of using the text mining data and the multi-level, multi-parameter scoring technique was able to identify known and novel transcription factors that play a role in CRC. Our methodology identified new possibilities for target therapies in CRC, thus demonstrating the usefulness of combining experimental data with literature to understand any disease.
    BMC Cancer 08/2012; 12(1):331. · 3.33 Impact Factor

Full-text (3 Sources)

Available from
Jun 1, 2014

Similar Publications