Article

Gene set analysis for longitudinal gene expression data.

School of Medicine & Health Sciences, University of North Dakota, Grand Forks, ND 58202, USA.
BMC Bioinformatics (impact factor: 2.75). 07/2011; 12:273. DOI:10.1186/1471-2105-12-273
Source: PubMed

ABSTRACT Gene set analysis (GSA) has become a successful tool to interpret gene expression profiles in terms of biological functions, molecular pathways, or genomic locations. GSA performs statistical tests for independent microarray samples at the level of gene sets rather than individual genes. Nowadays, an increasing number of microarray studies are conducted to explore the dynamic changes of gene expression in a variety of species and biological scenarios. In these longitudinal studies, gene expression is repeatedly measured over time such that a GSA needs to take into account the within-gene correlations in addition to possible between-gene correlations.
We provide a robust nonparametric approach to compare the expressions of longitudinally measured sets of genes under multiple treatments or experimental conditions. The limiting distributions of our statistics are derived when the number of genes goes to infinity while the number of replications can be small. When the number of genes in a gene set is small, we recommend permutation tests based on our nonparametric test statistics to achieve reliable type I error and better power while incorporating unknown correlations between and within-genes. Simulation results demonstrate that the proposed method has a greater power than other methods for various data distributions and heteroscedastic correlation structures. This method was used for an IL-2 stimulation study and significantly altered gene sets were identified.
The simulation study and the real data application showed that the proposed gene set analysis provides a promising tool for longitudinal microarray analysis. R scripts for simulating longitudinal data and calculating the nonparametric statistics are posted on the North Dakota INBRE website http://ndinbre.org/programs/bioinformatics.php. Raw microarray data is available in Gene Expression Omnibus (National Center for Biotechnology Information) with accession number GSE6085.

0 0
 · 
0 Bookmarks
 · 
82 Views
  • Article: Significance analysis of microarrays applied to the ionizing radiation response.
    [show abstract] [hide abstract]
    ABSTRACT: Microarrays can measure the expression of thousands of genes to identify changes in expression between different biological states. Methods are needed to determine the significance of these changes while accounting for the enormous number of genes. We describe a method, Significance Analysis of Microarrays (SAM), that assigns a score to each gene on the basis of change in gene expression relative to the standard deviation of repeated measurements. For genes with scores greater than an adjustable threshold, SAM uses permutations of the repeated measurements to estimate the percentage of genes identified by chance, the false discovery rate (FDR). When the transcriptional response of human cells to ionizing radiation was measured by microarrays, SAM identified 34 genes that changed at least 1.5-fold with an estimated FDR of 12%, compared with FDRs of 60 and 84% by using conventional methods of analysis. Of the 34 genes, 19 were involved in cell cycle regulation and 3 in apoptosis. Surprisingly, four nucleotide excision repair genes were induced, suggesting that this repair pathway for UV-damaged DNA might play a previously unrecognized role in repairing DNA damaged by ionizing radiation.
    Proceedings of the National Academy of Sciences 05/2001; 98(9):5116-21. · 9.68 Impact Factor
  • Source
    Article: A module map showing conditional activity of expression modules in cancer.
    [show abstract] [hide abstract]
    ABSTRACT: DNA microarrays are widely used to study changes in gene expression in tumors, but such studies are typically system-specific and do not address the commonalities and variations between different types of tumor. Here we present an integrated analysis of 1,975 published microarrays spanning 22 tumor types. We describe expression profiles in different tumors in terms of the behavior of modules, sets of genes that act in concert to carry out a specific function. Using a simple unified analysis, we extract modules and characterize gene-expression profiles in tumors as a combination of activated and deactivated modules. Activation of some modules is specific to particular types of tumor; for example, a growth-inhibitory module is specifically repressed in acute lymphoblastic leukemias and may underlie the deregulated proliferation in these cancers. Other modules are shared across a diverse set of clinical conditions, suggestive of common tumor progression mechanisms. For example, the bone osteoblastic module spans a variety of tumor types and includes both secreted growth factors and their receptors. Our findings suggest that there is a single mechanism for both primary tumor proliferation and metastasis to bone. Our analysis presents multiple research directions for diagnostic, prognostic and therapeutic studies.
    Nature Genetics 11/2004; 36(10):1090-8. · 35.53 Impact Factor
  • Article: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.
    [show abstract] [hide abstract]
    ABSTRACT: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.
    Proceedings of the National Academy of Sciences 11/2005; 102(43):15545-50. · 9.68 Impact Factor

Full-text (2 Sources)

View
0 Downloads
Available from

Keywords

accession number GSE6085
 
gene expression
 
Gene Expression Omnibus
 
gene expression profiles
 
gene sets
 
greater power
 
heteroscedastic correlation structures
 
IL-2 stimulation study
 
incorporating unknown correlations
 
increasing number
 
independent microarray samples
 
longitudinal microarray analysis
 
nonparametric statistics
 
nonparametric test statistics
 
possible between-gene correlations
 
proposed gene
 
real data application
 
robust nonparametric approach
 
various data distributions
 
within-gene correlations