Gene Selection and clustering for time-course and dose-response microarray experiments using order-restricted inference

Biostatistics Branch, Laboratory of Molecular Carcinogenesis, Research Triangle Park, NC 27709, USA.
Bioinformatics (Impact Factor: 4.98). 06/2003; 19(7):834-41. DOI: 10.1093/bioinformatics/btg093
Source: PubMed


We propose an algorithm for selecting and clustering genes according to their time-course or dose-response profiles using gene expression data. The proposed algorithm is based on the order-restricted inference methodology developed in statistics. We describe the methodology for time-course experiments although it is applicable to any ordered set of treatments. Candidate temporal profiles are defined in terms of inequalities among mean expression levels at the time points. The proposed algorithm selects genes when they meet a bootstrap-based criterion for statistical significance and assigns each selected gene to the best fitting candidate profile. We illustrate the methodology using data from a cDNA microarray experiment in which a breast cancer cell line was stimulated with estrogen for different time intervals. In this example, our method was able to identify several biologically interesting genes that previous analyses failed to reveal.

Download full-text


Available from: Shyamal Peddada,
46 Reads
  • Source
    • "For hierarchical clustering, the data were standardized to mean  = 0, variance  = 1 and grouped using Euclidean distance as the dissimilarity measure and average linkage for merging. Since the data are not necessarily normally distributed with equal variances, and since the sample sizes are unequal among the comparison groups, we performed all comparisons using standard residual bootstrap methodology [39] implemented in ORIOGEN v.3.0 which is based on [40], [41] using 100000 bootstrap samples with a SAM correction of 0.10. Specifically, we compared samples from normal and tumor tissues, tumor tissues from older black and older whites, and tumor tissues from older and younger black women. "
    [Show abstract] [Hide abstract]
    ABSTRACT: The study of uterine leiomyomata (fibroids) provides a unique opportunity to investigate the physiological and molecular determinants of hormone dependent tumor growth and spontaneous tumor regression. We conducted a longitudinal clinical study of premenopausal women with leiomyoma that showed significantly different growth rates between white and black women depending on their age. Growth rates for leiomyoma were on average much higher from older black women than for older white women, and we now report gene expression pattern differences in tumors from these two groups of study participants. Total RNA from 52 leiomyoma and 8 myometrial samples were analyzed using Affymetrix Gene Chip expression arrays. Gene expression data was first compared between all leiomyoma and normal myometrium and then between leiomyoma from older black women (age 35 or older) and from older white women. Genes that were found significant in pairwise comparisons were further analyzed for canonical pathways, networks and biological functions using the Ingenuity Pathway Analysis (IPA) software. Whereas our comparison of leiomyoma to myometrium produced a very large list of genes highly similar to numerous previous studies, distinct sets of genes and signaling pathways were identified in comparisons of older black and white women whose tumors were likely to be growing and non-growing, respectively. Key among these were genes associated with regulation of apoptosis. To our knowledge, this is the first study to compare two groups of tumors that are likely to have different growth rates in order to reveal molecular signals likely to be influential in tumor growth.
    PLoS ONE 06/2013; 8(6):e63909. DOI:10.1371/journal.pone.0063909 · 3.23 Impact Factor
  • Source
    • "In particular, the development of gene-clustering algorithms that also detect temporal profiles is becoming increasingly important. Statistical bootstrap methods have been developed for assigning genes to candidate profiles [1]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Statistical evaluation of temporal gene expression profiles plays an important role in particular biological processes and conditions. We introduce a clustering method for this purpose, which is based on the expression patterns but is also influenced by temporal changes. We compare the results of our platform with methods based on expression or the rank of temporal changes. The proposed platform is illustrated with a temporal gene expression dataset comprised of primary human chondrocytes and mesenchymal stem cells (MSCs). We derived three clusters in each cell type and compared the content of these classes in terms of temporal changes, which can support biological performance. For statistical evaluation we introduce a validity measure that takes under consideration these temporal changes and we also perform an enrichment analysis of three central genes in each cluster. Even though we can detect certain statistical similarities, these might be due to different biological processes. Our proposed platform contributes to both the statistical and biological validation of temporal profiles.
    Conference proceedings: ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference 08/2012; 2012:1238-41. DOI:10.1109/EMBC.2012.6346161
  • Source
    • "Although we do not discuss the problem of selecting significant gene sets and subsets when comparing multiple experimental conditions, the proposed methodology can be extended to such situations by replacing Hotelling’s T2 statistic by commonly used statistics such as the Hotelling-Lawley trace test or the Roy’s largest root test. Furthermore, if the experimental conditions are ordered, such as in a time-course or a dose-response study, one can exploit order-restricted inference based methods developed in [29]. As commented by a reviewer of this manuscript, it is possible that in some applications only a few genes in a given pathway are differentially expressed where such subsets are not necessarily pre-defined. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Based on available biological information, genomic data can often be partitioned into pre-defined sets (e.g. pathways) and subsets within sets. Biologists are often interested in determining whether some pre-defined sets of variables (e.g. genes) are differentially expressed under varying experimental conditions. Several procedures are available in the literature for making such determinations, however, they do not take into account information regarding the subsets within each set. Secondly, variables (e.g. genes) belonging to a set or a subset are potentially correlated, yet such information is often ignored and univariate methods are used. This may result in loss of power and/or inflated false positive rate. We introduce a multiple testing-based methodology which makes use of available information regarding biologically relevant subsets within each pre-defined set of variables while exploiting the underlying dependence structure among the variables. Using this methodology, a biologist may not only determine whether a set of variables are differentially expressed between two experimental conditions, but may also test whether specific subsets within a significant set are also significant. The proposed methodology; (a) is easy to implement, (b) does not require inverting potentially singular covariance matrices, and (c) controls the family wise error rate (FWER) at the desired nominal level, (d) is robust to the underlying distribution and covariance structures. Although for simplicity of exposition, the methodology is described for microarray gene expression data, it is also applicable to any high dimensional data, such as the mRNA seq data, CpG methylation data etc.
    BMC Bioinformatics 07/2012; 13(1):177. DOI:10.1186/1471-2105-13-177 · 2.58 Impact Factor
Show more