Article

Weighted Multiple Hypothesis Testing Procedures

University of Alabama at Birmingham, Birmingham, AL 35294, USA.
Statistical Applications in Genetics and Molecular Biology (Impact Factor: 1.13). 02/2009; 8(1):Article23. DOI: 10.2202/1544-6115.1437
Source: PubMed

ABSTRACT

Multiple hypothesis testing is commonly used in genome research such as genome-wide studies and gene expression data analysis (Lin, 2005). The widely used Bonferroni procedure controls the family-wise error rate (FWER) for multiple hypothesis testing, but has limited statistical power as the number of hypotheses tested increases. The power of multiple testing procedures can be increased by using weighted p-values (Genovese et al., 2006). The weights for the p-values can be estimated by using certain prior information. Wasserman and Roeder (2006) described a weighted Bonferroni procedure, which incorporates weighted p-values into the Bonferroni procedure, and Rubin et al. (2006) and Wasserman and Roeder (2006) estimated the optimal weights that maximize the power of the weighted Bonferroni procedure under the assumption that the means of the test statistics in the multiple testing are known (these weights are called optimal Bonferroni weights). This weighted Bonferroni procedure controls FWER and can have higher power than the Bonferroni procedure, especially when the optimal Bonferroni weights are used. To further improve the power of the weighted Bonferroni procedure, first we propose a weighted Sidák procedure that incorporates weighted p-values into the Sidák procedure, and then we estimate the optimal weights that maximize the average power of the weighted Sidák procedure under the assumption that the means of the test statistics in the multiple testing are known (these weights are called optimal Sidák weights). This weighted Sidák procedure can have higher power than the weighted Bonferroni procedure. Second, we develop a generalized sequential (GS) Sidák procedure that incorporates weighted p-values into the sequential Sidák procedure (Scherrer, 1984). This GS idák procedure is an extension of and has higher power than the GS Bonferroni procedure of Holm (1979). Finally, under the assumption that the means of the test statistics in the multiple testing are known, we incorporate the optimal Sidák weights and the optimal Bonferroni weights into the GS Sidák procedure and the GS Bonferroni procedure, respectively. Theoretical proof and/or simulation studies show that the GS Sidák procedure can have higher power than the GS Bonferroni procedure when their corresponding optimal weights are used, and that both of these GS procedures can have much higher power than the weighted Sidák and the weighted Bonferroni procedures. All proposed procedures control the FWER well and are useful when prior information is available to estimate the weights.

Download full-text

Full-text

Available from: Guolian Kang
  • Source
    • "Another class of approaches focuses not on reducing the family-wise error rate but instead on controlling the expected proportion of false positives, the " false discovery rate " or FDR (Benjamini and Hochberg, 1995; Benjamini and Yekutieli, 2001). The above general approaches have been applied or modified to genetic association studies (Sabatti et al., 2003; Benjamini and Yekutieli, 2005; Roeder et al., 2007; Rice et al., 2008; Kang et al., 2009). Recently, geneticists have proposed special methods that take advantage of the relationship between markers (e.g., linkage disequilibrium) to define the effective number of independent tests and then adjust the original p-values using the Bonferroni correction (Gao et al., 2008; Galwey, 2009; Gao et al., 2010). "
    [Show abstract] [Hide abstract]
    ABSTRACT: Abstract Multiple comparisons or multiple testing has been viewed as a thorny issue in genetic association studies aiming to detect disease-associated genetic variants from a large number of genotyped variants. We alleviate the problem of multiple comparisons by proposing a hierarchical modeling approach that is fundamentally different from the existing methods. The proposed hierarchical models simultaneously fit as many variables as possible and shrink unimportant effects towards zero. Thus, the hierarchical models yield more efficient estimates of parameters than the traditional methods that analyze genetic variants separately, and also coherently address the multiple comparisons problem due to largely reducing the effective number of genetic effects and the number of statistically "significant" effects. We develop a method for computing the effective number of genetic effects in hierarchical generalized linear models, and propose a new adjustment for multiple comparisons, the hierarchical Bonferroni correction, based on the effective number of genetic effects. Our approach not only increases the power to detect disease-associated variants but also controls the Type I error. We illustrate and evaluate our method with real and simulated data sets from genetic association studies. The method has been implemented in our freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).
    Full-text · Article · Feb 2014 · Statistical Applications in Genetics and Molecular Biology
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Improved procedures, in terms of smaller missed discovery rates (MDR), for performing multiple hypotheses testing with weak and strong control of the family-wise error rate (FWER) or the false discovery rate (FDR) are developed and studied. The improvement over existing procedures such as the \v{S}id\'ak procedure for FWER control and the Benjamini--Hochberg (BH) procedure for FDR control is achieved by exploiting possible differences in the powers of the individual tests. Results signal the need to take into account the powers of the individual tests and to have multiple hypotheses decision functions which are not limited to simply using the individual $p$-values, as is the case, for example, with the \v{S}id\'ak, Bonferroni, or BH procedures. They also enhance understanding of the role of the powers of individual tests, or more precisely the receiver operating characteristic (ROC) functions of decision processes, in the search for better multiple hypotheses testing procedures. A decision-theoretic framework is utilized, and through auxiliary randomizers the procedures could be used with discrete or mixed-type data or with rank-based nonparametric tests. This is in contrast to existing $p$-value based procedures whose theoretical validity is contingent on each of these $p$-value statistics being stochastically equal to or greater than a standard uniform variable under the null hypothesis. Proposed procedures are relevant in the analysis of high-dimensional "large $M$, small $n$" data sets arising in the natural, physical, medical, economic and social sciences, whose generation and creation is accelerated by advances in high-throughput technology, notably, but not limited to, microarray technology.
    Full-text · Article · Aug 2009 · The Annals of Statistics
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Human beings are an incredibly social species and along with eusocial insects engage in the largest cooperative living groups in the planet's history. Twin and family studies suggest that uniquely human characteristics such as empathy, altruism, sense of equity, love, trust, music, economic behavior, and even politics are partially hardwired. The leap from twin studies to identifying specific genes engaging the social brain has occurred in the past decade, aided by deep insights accumulated about social behavior in lower mammals. Remarkably, genes such as the arginine vasopressin receptor and the oxytocin receptor contribute to social behavior in a broad range of species from voles to man. Other polymorphic genes constituting the "usual suspects"--i.e., those encoding for dopamine reward pathways, serotonergic emotional regulation, or sex hormones--further enable elaborate social behaviors.
    Full-text · Article · Mar 2010 · Neuron
Show more