Estimation of False Discovery Rates in Multiple Testing: Application to Gene Microarray Data

Division of Biometry and Risk Assessment, National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas, USA.
Biometrics (Impact Factor: 1.57). 01/2004; 59(4):1071-81. DOI: 10.1111/j.0006-341X.2003.00123.x
Source: PubMed


Testing for significance with gene expression data from DNA microarray experiments involves simultaneous comparisons of hundreds or thousands of genes. If R denotes the number of rejections (declared significant genes) and V denotes the number of false rejections, then V/R, if R > 0, is the proportion of false rejected hypotheses. This paper proposes a model for the distribution of the number of rejections and the conditional distribution of V given R, V / R. Under the independence assumption, the distribution of R is a convolution of two binomials and the distribution of V / R has a noncentral hypergeometric distribution. Under an equicorrelated model, the distributions are more complex and are also derived. Five false discovery rate probability error measures are considered: FDR = E(V/R), pFDR = E(V/R / R > 0) (positive FDR), cFDR = E(V/R / R = r) (conditional FDR), mFDR = E(V)/E(R) (marginal FDR), and eFDR = E(V)/r (empirical FDR). The pFDR, cFDR, and mFDR are shown to be equivalent under the Bayesian framework, in which the number of true null hypotheses is modeled as a random variable. We present a parametric and a bootstrap procedure to estimate the FDRs. Monte Carlo simulations were conducted to evaluate the performance of these two methods. The bootstrap procedure appears to perform reasonably well, even when the alternative hypotheses are correlated (rho = .25). An example from a toxicogenomic microarray experiment is presented for illustration.

10 Reads
  • Source
    • "This enabled the entire experiment to be completed with a single microarray slide. This sample size is underpowered for subtly regulated transcripts (see discussion) but does meet the consensus recommendation of at least 5 biological replicates per group (Allison, Cui, Page, & Sabripour, 2006; Pavlidis, Li, & Noble, 2003; Tsai, Hsueh, & Chen, 2003). We collected samples from an additional 12 animals to test for generalization to an independent sample with qPCR. "
    [Show abstract] [Hide abstract]
    ABSTRACT: We used a custom-designed microarray and quantitative PCR to characterize the rapid transcriptional response to long-term sensitization training in the marine mollusk Aplysia californica. Aplysia were exposed to repeated noxious shocks to one side of the body, a procedure known to induce a long-lasting, transcription-dependent increase in reflex responsiveness that is restricted to the side of training. One hour after training, pleural ganglia from the trained and untrained sides of the body were harvested; these ganglia contain the sensory nociceptors which help mediate the expression of long-term sensitization memory. Microarray analysis from 8 biological replicates suggests that long-term sensitization training rapidly regulates at least 81 transcripts. We used qPCR to test a subset of these transcripts and found that 83% were confirmed in the same samples, and 86% of these were again confirmed in an independent sample. Thus, our new microarray design shows strong convergent and predictive validity for analyzing the transcriptional correlates of memory in Aplysia. Fully validated transcripts include some previously identified as regulated in this paradigm (ApC/EBP and ApEgr) but also include novel findings. Specifically, we show that long-term sensitization training rapidly up-regulates the expression of transcripts which may encode Aplysia homologs of a C/EBPγ transcription factor, a glycine transporter (GlyT2), and a vacuolar-protein-sorting-associated protein (VPS36).
    Neurobiology of Learning and Memory 08/2014; 116. DOI:10.1016/j.nlm.2014.07.009 · 3.65 Impact Factor
  • Source
    • "To identify specific serum lipids species associated with PCa, we performed MS analyses. Given the necessity of simultaneously comparing hundreds of lipids, we incorporated the false discovery rate (FDR) into our analyses [29], [30]. Tables 2 and 3 provide details of the aged-matched serum samples; including the Gleason scores and PSA levels for patients diagnosed with PCa (the full medical history can be found in Data S1). "
    [Show abstract] [Hide abstract]
    ABSTRACT: The results of prostate specific antigen (PSA) and digital rectal examination (DRE) screenings lead to both under and over treatment of prostate cancer (PCa). As such, there is an urgent need for the identification and evaluation of new markers for early diagnosis and disease prognosis. Studies have shown a link between PCa, lipids and lipid metabolism. Therefore, the aim of this study was to examine the concentrations and distribution of serum lipids in patients with PCa as compared with serum from controls. Using Electrospray ionization mass spectrometry (ESI-MS/MS) lipid profiling, we analyzed serum phospholipids from age-matched subjects who were either newly diagnosed with PCa or healthy (normal). We found that cholester (CE), dihydrosphingomyelin (DSM), phosphatidylcholine (PC), egg phosphatidylcholine (ePC) and egg phoshphatidylethanolamine (ePE) are the 5 major lipid groups that varied between normal and cancer serums. ePC 38:5, PC 40:3, and PC 42:4 represent the lipids species most prevalent in PCa as compared with normal serum. Further analysis revealed that serum ePC 38:5 ≥0.015 nmoles, PC 40.3 ≤0.001 nmoles and PC 42:4 ≤0.0001 nmoles correlated with the absence of PCa at 94% prediction. Conversely, serum ePC 38:5 ≤0.015 nmoles, PC 40:3 ≥0.001 nmoles, and PC 42:4 ≥0.0001 nmoles correlated with the presence of PCa. In summary, we have demonstrated that ePC 38:5, PC 40:3, and PC 42:4 may serve as early predictive serum markers for the presence of PCa.
    PLoS ONE 03/2014; 9(3):e88841. DOI:10.1371/journal.pone.0088841 · 3.23 Impact Factor
  • Source
    • "Significance Analysis of Microarrays (SAM) method [21] was used to identify genes differentially expressed between the Control and CRC groups. For gene expression studies involving microarrays, it has become common practice to focus on control of the false discovery rate (FDR), which estimates the expected proportion of incorrect rejections among the rejected hypotheses [22], [23]. To minimize false positives, we set the threshold of FDR at 0.01 for all the comparisons. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Colorectal cancer is the leading cause of cancer-related deaths worldwide. The disease is curable when detected at an early stage. However, the compliance rate with current screening recommendations remains poor. An accurate, minimally invasive blood test that has the potential for greater patient compliance would be a welcome addition to the current methods. Recent data have shown that gene expression profile of peripheral blood cells can reflect disease states and thus have diagnostic value. In this study, genome-wide gene expression profiling of peripheral blood cells from 20 healthy controls and 20 colorectal cancer patients were performed using PAXgene™ technology and Affymetrix GeneChip® microarrays. We identified a list of 1,469 genes that were differentially expressed between the healthy controls and cancer patients. Gene annotation and functional enrichment analysis revealed that those genes are mainly related to immune functions. Particularly, a set of genes belonging to the Toll-Like Receptor pathways were up-regulated in the colorectal cancer patients. These findings provide a new understanding of blood gene expression profile in colorectal cancer. Our result may serve as the basis for further development of blood biomarkers for the diagnosis and treatment of colorectal cancer.
    PLoS ONE 10/2013; 8(5):e62870. DOI:10.1371/journal.pone.0062870 · 3.23 Impact Factor
Show more