A permutation-based multiple testing method for time-course microarray experiments.

Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, North Carolina 27710, USA.
BMC Bioinformatics (Impact Factor: 2.67). 01/2009; 10:336. DOI: 10.1186/1471-2105-10-336
Source: PubMed

ABSTRACT Time-course microarray experiments are widely used to study the temporal profiles of gene expression. Storey et al. (2005) developed a method for analyzing time-course microarray studies that can be applied to discovering genes whose expression trajectories change over time within a single biological group, or those that follow different time trajectories among multiple groups. They estimated the expression trajectories of each gene using natural cubic splines under the null (no time-course) and alternative (time-course) hypotheses, and used a goodness of fit test statistic to quantify the discrepancy. The null distribution of the statistic was approximated through a bootstrap method. Gene expression levels in microarray data are often complicatedly correlated. An accurate type I error control adjusting for multiple testing requires the joint null distribution of test statistics for a large number of genes. For this purpose, permutation methods have been widely used because of computational ease and their intuitive interpretation.
In this paper, we propose a permutation-based multiple testing procedure based on the test statistic used by Storey et al. (2005). We also propose an efficient computation algorithm. Extensive simulations are conducted to investigate the performance of the permutation-based multiple testing procedure. The application of the proposed method is illustrated using the Caenorhabditis elegans dauer developmental data.
Our method is computationally efficient and applicable for identifying genes whose expression levels are time-dependent in a single biological group and for identifying the genes for which the time-profile depends on the group in a multi-group setting.

  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mature microRNAs (miRNAs) are small endogenous non-coding RNAs 18-25 nt in length. They program the RNA Induced Silencing Complex (RISC) to make it inhibit either messenger RNAs or promoter DNAs. We have found that the mean abundance of miRNAs in Arabidopsis is correlated with the abundance of DRYD tetranucleotides near the 3'-end and the abundance of WRHB tetranucleotides in the center of the miRNA sequence. Based on this correlation, we have estimated miRNA abundances in seven organs of this plant, namely: inflorescences, stems, siliques, seedlings, roots, cauline, and rosette leaves. We have also found that the mean affinity of miRNAs for two proteins in the Argonaute family (Ago2 and Ago3) in man is correlated with the abundance of YRHB tetranucleotides near the 3'-end and that the preference of miRNAs for Ago2 is correlated with the abundance of RHHK tetranucleotides in the center of the miRNA sequence. This allowed us to obtain statistically significant estimates of miRNA abundances in human embryonic kidney cells, HEK293T. These findings in relation to two taxonomically distant entities (man and Arabidopsis) fit one another like pieces of a jigsaw puzzle, which allowed us to heuristically generalize them and state that the miRNA abundance in the human brain may be determined by the abundance of YRHB and RHHK tetranucleotides in these miRNAs.
    Frontiers in Genetics 01/2013; 4:122.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: One of the fundamental problems in time course gene expression data analysis is to identify genes associatedwith a biological process or a particular stimulus of interest, like a treatment or virus infection. Most of theexisting methods for this problem are designed for data with longitudinal replicates. But in reality, many timecourse gene experiments have no replicates or only have a small number of independent replicates. RESULTS: We focus on the case without replicates and propose a new method for identifying differentially expressedgenes by incorporating the functional principal component analysis (FPCA) into a hypothesis testingframework. The data-driven eigenfunctions allow a flexible and parsimonious representation of time coursegene expression trajectories, leaving more degrees of freedom for the inference compared to that using aprespecified basis. Moreover, the information of all genes is borrowed for individual gene inferences. CONCLUSION: The proposed approach turns out to be more powerful in identifying time course differentially expressed genescompared to the existing methods. The improved performance is demonstrated through simulation studies anda real data application to the Saccharomyces cerevisiae cell cycle data.
    BMC Bioinformatics 01/2013; 14(1):6. · 2.67 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.
    BioMed research international. 01/2013; 2013:203681.

Full-text (2 Sources)

Available from