A permutation-based multiple testing method for time-course microarray experiments

Department of Biostatistics and Bioinformatics, Duke University Medical Center, Durham, North Carolina 27710, USA.
BMC Bioinformatics (Impact Factor: 2.67). 10/2009; 10:336. DOI: 10.1186/1471-2105-10-336
Source: DBLP

ABSTRACT Time-course microarray experiments are widely used to study the temporal profiles of gene expression. Storey et al. (2005) developed a method for analyzing time-course microarray studies that can be applied to discovering genes whose expression trajectories change over time within a single biological group, or those that follow different time trajectories among multiple groups. They estimated the expression trajectories of each gene using natural cubic splines under the null (no time-course) and alternative (time-course) hypotheses, and used a goodness of fit test statistic to quantify the discrepancy. The null distribution of the statistic was approximated through a bootstrap method. Gene expression levels in microarray data are often complicatedly correlated. An accurate type I error control adjusting for multiple testing requires the joint null distribution of test statistics for a large number of genes. For this purpose, permutation methods have been widely used because of computational ease and their intuitive interpretation.
In this paper, we propose a permutation-based multiple testing procedure based on the test statistic used by Storey et al. (2005). We also propose an efficient computation algorithm. Extensive simulations are conducted to investigate the performance of the permutation-based multiple testing procedure. The application of the proposed method is illustrated using the Caenorhabditis elegans dauer developmental data.
Our method is computationally efficient and applicable for identifying genes whose expression levels are time-dependent in a single biological group and for identifying the genes for which the time-profile depends on the group in a multi-group setting.

  • Source
    • "To balance the heavy computational burden and the size of the probability to be estimated, we performed 1000 replications that seemed large enough to estimate the empirical P-value at significance levels of 0.01 and 0.05 in simulation studies. Others have taken similar strategies in related simulations (McDonough et al., 2009; Sohn et al., 2009). In a real analysis, more permutations could be carried out if necessary. "
    [Show abstract] [Hide abstract]
    ABSTRACT: High-dimensional data are frequently generated in genome-wide association studies (GWAS) and other studies. It is important to identify features such as single nucleotide polymorphisms (SNPs) in GWAS that are associated with a disease. Random forests represent a very useful approach for this purpose, using a variable importance score. This importance score has several shortcomings. We propose an alternative importance measure to overcome those shortcomings. We characterized the effect of multiple SNPs under various models using our proposed importance measure in random forests, which uses maximal conditional chi-square (MCC) as a measure of association between a SNP and the trait conditional on other SNPs. Based on this importance measure, we employed a permutation test to estimate empirical P-values of SNPs. Our method was compared to a univariate test and the permutation test using the Gini and permutation importance. In simulation, the proposed method performed consistently superior to the other methods in identifying of risk SNPs. In a GWAS of age-related macular degeneration, the proposed method confirmed two significant SNPs (at the genome-wide adjusted level of 0.05). Further analysis showed that these two SNPs conformed with a heterogeneity model. Compared with the existing importance measures, the MCC importance measure is more sensitive to complex effects of risk SNPs by utilizing conditional information on different SNPs. The permutation test with the MCC importance measure provides an efficient way to identify candidate SNPs in GWAS and facilitates the understanding of the etiology between genetic variants and complex diseases. Supplementary data are available at Bioinformatics online.
    Bioinformatics 03/2010; 26(6):831-7. DOI:10.1093/bioinformatics/btq038 · 4.62 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The effects of voltage-thermal stress on the conduction behaviors of VLDPE and LLDPE have been investigated. It is observed that the conduction current density of the voltage-thermal aged VLDPE decreases while the conduction current of aged LLDPE increases significantly. We could conclude that the decrease of conduction current in the aged VLDPE may be attributed to the slight increase of activation energy and decrease of hopping distance, and the large decrease of activation energy (~0.2 [eV]) together with electrode effect were responsible for the increase of conduction current in the aged LLDPE. We could confirm, from the XRD analysis, that the changes of conduction behaviors in aged VLDPE and LLDPE may be due to the decrease of deep traps or defects in the ordered phase of aged LLDPE and increase of traps between crystalline-amorphous boundaries of VLDPE respectively
    Conduction and Breakdown in Solid Dielectrics, 1995. ICSD'95., Proceedings of the 1995 IEEE 5th International Conference on; 08/1995
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: In a time-course microarray experiment, the expression level for each gene is observed across a number of time-points in order to characterize the temporal trajectories of the gene-expression profiles. For many of these experiments, the scientific aim is the identification of genes for which the trajectories depend on an experimental or phenotypic factor. There is an extensive recent body of literature on statistical methodology for addressing this analytical problem. Most of the existing methods are based on estimating the time-course trajectories using parametric or non-parametric mean regression methods. The sensitivity of these regression methods to outliers, an issue that is well documented in the statistical literature, should be of concern when analyzing microarray data. In this paper, we propose a robust testing method for identifying genes whose expression time profiles depend on a factor. Furthermore, we propose a multiple testing procedure to adjust for multiplicity. Through an extensive simulation study, we will illustrate the performance of our method. Finally, we will report the results from applying our method to a case study and discussing potential extensions.
    BMC Bioinformatics 07/2010; 11(1):391. DOI:10.1186/1471-2105-11-391 · 2.67 Impact Factor
Show more

Preview (2 Sources)

Available from