Chapter

On New Multiple Tests Based on Independent p-Values and the Assessment of Their Power

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

A new class of stagewise rejective test procedures is proposed for the multiple test problem consisting of n ≥ 2 pairs of null and alternative hypotheses with mutually independent test statistics. The members of this class, called stagewise rejective linear minmax tests, are generated by the closing principle applied to global combination tests whose corresponding test statistics are linear combinations of the minimum, P (1), and the maximum, P (n), of the p-values associated with the single tests. The respective weights are determined by a single parameter k ∈ [0,1] and the level α. The well-known test based exclusively on P (1) proposed by Tippett (1931) is a special case (K = 0); its extension to a multiple test is due to Holm (1979). On the other hand, the test for K = 1 rejects the global null hypothesis if (1 − α)P (1) + αP(n) ≤ α. It is shown that all tests of the class exhaust the multiple level a and therefore cannot be improved uniformly. Their relative merits have to be judged by means of power functions for multiple test procedures. Such functions are presented and discussed in a more general context. The expected number of correctly rejected null hypotheses is recommended as a relatively simple and comprehensive way to summarize the performance of multiple tests. The various power functions are illustrated by their application to three members of the class (k = 0, 0.9, 1) and to the Simes-Hommel test by means of simulations. For the simultaneous test with k = 1 numerical derivations of the power functions are presented. On the basis of these results, it is argued that the stagewise rejective linear minmax test with k = 0.9 has a performance that is always close to that of the best performing competitor and is therefore to be recommended when little a priori information on the number and type of possible alternatives is available.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... This package is currently under development on GitHub; see https://github.com/lmiratrix/blkvar 11 Note that others refer to 1−minimal power simply as "minimal power" (e.g.,Maurer and Mellein (1988);Chen et al. (2011); Westfall, Tobias, and Wolfinger (2011)), "disjunctive power" (e.g.,Bretz, Hothorn, and Westfall (2010)), or "any pair" power(Ramsey (1978)).Chen et al. (2011) use the terminology of "r-power" for what is referred to here as d−minimal power for d > 1. ...
Preprint
Full-text available
For randomized controlled trials (RCTs) with a single intervention being measured on multiple outcomes, researchers often apply a multiple testing procedure (such as Bonferroni or Benjamini-Hochberg) to adjust p-values. Such an adjustment reduces the likelihood of spurious findings, but also changes the statistical power, sometimes substantially, which reduces the probability of detecting effects when they do exist. However, this consideration is frequently ignored in typical power analyses, as existing tools do not easily accommodate the use of multiple testing procedures. We introduce the PUMP R package as a tool for analysts to estimate statistical power, minimum detectable effect size, and sample size requirements for multi-level RCTs with multiple outcomes. Multiple outcomes are accounted for in two ways. First, power estimates from PUMP properly account for the adjustment in p-values from applying a multiple testing procedure. Second, as researchers change their focus from one outcome to multiple outcomes, different definitions of statistical power emerge. PUMP allows researchers to consider a variety of definitions of power, as some may be more appropriate for the goals of their study. The package estimates power for frequentist multi-level mixed effects models, and supports a variety of commonly-used RCT designs and models and multiple testing procedures. In addition to the main functionality of estimating power, minimum detectable effect size, and sample size requirements, the package allows the user to easily explore sensitivity of these quantities to changes in underlying assumptions.
... D I n I 0 , m 1 D jI 1 j, S.'/ D P i2I 1 ' i and refer to the expected proportion of correctly detected alternatives, i.e., power # .'/ D E # OES.'/= max.m 1 ; 1/, as the multiple power of ' under #, see also [34]. If the structure of ' is such that ' i D 1 p i Ät for a common, possibly data-dependent threshold t , then the multiple power of ' is increasing in t . ...
Chapter
Based on the theory of multiple statistical hypotheses testing, we elaborate likelihood-based simultaneous statistical inference methods in dynamic factor models (DFMs). To this end, we work up and extend the methodology of Geweke and Singleton (Int Econ Rev 22:37–54, 1981) by proving a multivariate central limit theorem for empirical Fourier transforms of the observable time series. In an asymptotic regime with observation horizon tending to infinity, we employ structural properties of multivariate chi-square distributions in order to construct asymptotic critical regions for a vector of Wald statistics in DFMs, assuming that the model is identified and model restrictions are testable. A model-based bootstrap procedure is proposed for approximating the joint distribution of such a vector for finite sample sizes. Examples of important multiple test problems in DFMs demonstrate the relevance of the proposed methods for practical applications.
... It is additionally investigated if the proposed test procedures keep the multiple level , where it can beshown that two of our proposals fulll this property whereas the third modiication does not. The procedures are then compared with respect to their power by means of Monte{Carlo experiments based on the simultaneous power (Maurer & Mellein, 1988) and the relative frequency of correctly rejected false hypotheses. ...
Article
We present step-wise test procedures based on the Bonferroni-Holm [see S. Holm, Scand. J. Stat., Theory Appl. 6, 65–70 (1979; Zbl 0402.62058)] principle for multi-way ANOVA-type models. It is shown for two plausible modifications that the multiple level α is preserved. These theoretical results are supplemented by a simulation study, in a two-way ANOVA setting, to compare the multiple procedures with respect to their simultaneous power and the relative frequency of correctly rejected false hypotheses.
Article
We consider one of the most basic multiple testing problems that compares expectations of multivariate data among several groups. As a test statistic, a conventional (approximate) t-statistic is considered, and we determine its rejection region using a common rejection limit. When there are unknown correlations among test statistics, the multiplicity adjusted p-values are dependent on the unknown correlations. They are usually replaced with their estimates that are always consistent under any hypothesis. In this paper, we propose the use of estimates, which are not necessarily consistent and are referred to as spurious correlations, in order to improve statistical power. Through simulation studies, we verify that the proposed method asymptotically controls the family-wise error rate and clearly provides higher statistical power than existing methods. In addition, the proposed and existing methods are applied to a real multiple testing problem that compares quantitative traits among groups of mice and the results are compared.
Article
Researchers are often interested in testing the effectiveness of an intervention on multiple outcomes, for multiple subgroups, at multiple points in time, or across multiple treatment groups. The resulting multiplicity of statistical hypothesis tests can lead to spurious findings of effects. Multiple testing procedures (MTPs) are statistical procedures that counteract this problem by adjusting p-values for effect estimates upward. While MTPs are increasingly used in impact evaluations in education and other areas, an important consequence of their use is a change in statistical power that can be substantial. Unfortunately, researchers frequently ignore the power implications of MTPs when designing studies. Consequently, in some cases, sample sizes may be too small, and studies may be underpowered to detect effects as small as a desired size. In other cases, sample sizes may be larger than needed, or studies may be powered to detect smaller effects than anticipated. This paper presents methods for estimating statistical power, for multiple definitions of statistical power and presents empirical findings on how power is affected by the use of MTPs.
Chapter
The paper is mainly concerned with multiple testing procedures which control a given multiple level α. General concepts for this purpose are the closure test and a modification which is independent of the special structure of hypotheses and tests. We consider improvements of this modification using information about the logical dependences (redundancies) within the system of hypotheses and present an efficient algorithm. Finally, we discuss some problems which are specific for hierarchical systems of hypotheses, e.g. in model search.
Chapter
We introduce the problem of simultaneous statistical inference, with particular emphasis on testing multiple hypotheses. After a historic overview, general notation for the whole work is set up and different sources of multiplicity are distinguished. We define a variety of classical and modern type I and type II error rates in multiple hypotheses testing, analyze some relationships between them, and consider different ways to cope with structured systems of hypotheses. Relationships between multiple testing and other simultaneous statistical inference problems, in particular the construction of confidence regions for multi-dimensional parameters, as well as selection, ranking and partitioning problems, are elucidated. Finally, a general outline of the remainder of the work is given.
Article
As deciding on more than one null hypothesis based upon the same data set can provoke an inflation of the type I error rate, special methods for these multiple testing problems have been developed since Fisher (1935). Although highly relevant for different areas of research, especially for economics and the social sciences, multiple tests are relatively rarely applied. This paper, therefore, reviews and systematizes the evolution of theory and methods concerning multiple comparisons. We particu larly pursue the objective to sensitize users of statistics to the issues related to multiple testing.
Article
In the comparison of various dose levels it can often be assumed that the parameters to be tested follow an order restriction. Two closed multiple test procedures for detecting the highest dose level still providing a shift in the response distribution as compared to the adjacent lower dose level is proposed. One is based on one sided comparisons between neighbouring doses, the other uses Helmert-type contrast statistics. If a sequence of testing is fixed in advance the multiple test can be suitably modified. The power of the procedures is simulated under the assumption of normally distributed responses for various constellations of the dose means. It is compared with the power of a general Holm-type procedure discussed in BUDDE & BAUER (1989).
Article
This paper proposes a multiple testing procedure that allows one to reject each individual hypothesis at a prespecified level α, while still controlling the familywise error rate at α in the strong sense. Typically, rejecting a hypothesis when its marginal p-value is ⩽α in a multiple hypothesis testing setting will lead to an inflation of familywise error rate. However, this inflation can be avoided if a particular consistency criterion is prespecified and incorporated in the testing algorithm. The criterion is equivalent to requiring that all p-values be smaller than or equal to a particular threshold in the one-sided hypothesis testing setting. Extensions to the two-sided hypothesis testing setting and extensions to situations where the criterion can be chosen per user's preference are also presented. Copyright © 2012 John Wiley & Sons, Ltd.
Article
This paper reviews global and multiple tests for the combination ofn hypotheses using the orderedp-values of then individual tests. In 1987, Röhmel and Streitberg presented a general method to construct global level α tests based on orderedp-values when there exists no prior knowledge regarding the joint distribution of the corresponding test statistics. In the case of independent test statistics, construction of global tests is available by means of recursive formulae presented by Bicher (1989), Kornatz (1994) and Finner and Roters (1994). Multiple test procedures can be developed by applying the closed test principle using these global tests as building blocks. Liu (1996) proposed representing closed tests by means of “critical matrices” which contain the critical values of the global tests. Within the framework of these theoretical concepts, well-known global tests and multiple test procedures are classified and the relationships between the different tests are characterised.
Article
It is the purpose of this paper to review the main aspects related to multiple test problems. This concerns among others the particularities of multiple tests as for instance the formulation of restrictions to avoid inconsistent decisions and of criteria to control for a multiple type I error rate. In addition, the basic principles for constructing multiple tests are introduced and their properties are summarized. The paper closes with giving a rough idea of further special multiple test problems and their corresponding test procedures.
Article
Global tests and multiple test procedures are often based on ordered p values. Such procedures are available for arbitrary dependence structures as well as for specific dependence assumptions of the test statistics. Most of these procedures have been considered as global tests. Multiple test procedures can be obtained by applying the closure principle in order to control the familywise error rate, or by using the false discovery rate as a criterion for type I error rate control. We provide an overview and present examples showing the importance of these procedures in medical research. Finally, we discuss modifications when different weights for the hypotheses of interest are chosen.
Article
A variety of powerful test procedures are available for the analysis of clinical trials addressing multiple objectives, such as comparing several treatments with a control, assessing the benefit of a new drug for more than one endpoint, etc. However, some of these procedures have reached a level of complexity that makes it difficult to communicate the underlying test strategies to clinical teams. Graphical approaches have been proposed instead that facilitate the derivation and communication of Bonferroni-based closed test procedures. In this paper we give a coherent description of the methodology and illustrate it with a real clinical trial example. We further discuss suitable power measures for clinical trials with multiple primary and/or secondary objectives and use a generic example to illustrate our considerations.
Article
This paper aims at generalizing the concept of unbiasedness of a single statistical test to multiple test procedures. In addition, it is investigated if a necessary and sufficient condition for a multiple test to the unbiased is given by the unbiasedness of its components. Different examples are presented for illustrating the new approaches.
Article
It is demonstrated how improvements of general multiple test procedures can be obtained using information about the logical structures among the hypotheses. Based on a procedure of Bergmann and Hommel (B. Bergmann and G. Hommel, Improvements of general multiple test procedures for redundant systems of hypotheses, in Multiple Hypothesenprüfung--Multiple Hypotheses Testing, Eds. P. Bauer, G. Hommel and E. Sonnemann, pp. 100-115 (Springer-Verlag, Berlin, 1988)), a computer program was written by Bernhard (G. Bernhard, Computerunterstützte Durchführung von multiplen Testprozeduren--Algorithmen und Powervergleich, Doctoral thesis (Mainz, 1992)) using this information. It is applicable for a general class of systems of hypotheses which can be expressed in a linear way. By means of a simulation study it is shown that the proposed procedure is often substantially more powerful than other usual multiple test procedures.
Article
Full-text available
Suppose that n hypotheses H1, H2,..., Hn with associated test statistics T1, T2 ..., Tn are to be tested by a procedure with experimentwise significance level (the probability of rejecting one or more true hypotheses) smaller than or equal to some specified value α. A commonly used procedure satisfying this condition is the Bonferroni (B) procedure, which consists of rejecting Hi, for any i, iff the associated test statistic Ti is significant at the level α' = α/n. Holm (1979) introduced a modified Bonferroni procedure with greater power than the B procedure. Under Holm's sequentially rejective Bonferroni (SRB) procedure, if any hypothesis is rejected at the level α' = α/n, the denominator of α' for the next test is n - 1, and the criterion continues to be modified in a stagewise manner, with the denominator of α' reduced by 1 each time a hypothesis is rejected, so that tests can be conducted at successively higher significance levels. Holm proved that the experimentwise significance level of the SRB procedure is ≤α, as is that of the original B procedure. Often, the hypotheses being tested are logically interrelated so that not all combinations of true and false hypotheses are possible. As a simple example of such a situation suppose, given samples from three distributions, we want to test the three hypotheses of pairwise equality: $\mu_i = \mu'_i (i
Article
Full-text available
A modification of the Bonferroni procedure for testing multiple hypotheses is presented. The method, based on the ordered p-values of the individual tests, is less conservative than the classical Bonferroni procedure but is still simple to apply. A simulation study shows that the probability of a type I error of the procedure does not exceed the nominal significance level, α, for a variety of multivariate normal and multivariate gamma test statistics. For independent tests the procedure has type I error probability equal to α. The method appears particularly advantageous over the classical Bonferroni procedure when several highly-correlated test statistics are involved.
Chapter
Gegeben seien n statistische Testprobleme mit den (elementaren) Nullhypothesen H1,…,Hn. Ho=⋂{Hi:i=1,…,n} sei die zugehörige Globalhypothese. Gelegentlich ist man nur an einer Aussage über Ho interessiert (z.B. bei der Zusammenfassung von Studien); oft will man jedoch, sofern Ho abgelehnt wurde, genauere Aussagen machen, welche der Hi, i=1,…,n, unwahr sind, i.a. unter Kontrolle des multiplen Niveaus.
Article
In the present paper it is investigated what kinds of improvements of general multiple test procedures are possible when informations about “redundancies” (logical dependencies) in the system of hypotheses are used completely or partially. For the case of equal weights for the elementary tests several improved “static” procedures are given, including Shaffer’s (1986) “MSRB procedure”. If unequal weights are allowed, one obtains four different “dynamic” improvements of Holm’s (1979) procedure which become, in ascending sequence, more and more powerful, but also more extensive for computation. Some practicable algorithms are proposed which can be used in a corresponding computer program performing multiple tests “automatically”. Finally, the application in model search is discussed. The proofs not given in this paper as well as numerous additional examples can be found in Bergmann (1987).
Article
Based on the concept of the multiple level of significance two criteria for assessing the performance of multiple tests are proposed: The simultaneous power is defined as the probability of rejecting all false null hypotheses. The probability of a correct decision is defined as the probability of correctly rejecting all false null hypotheses and accepting all true ones. Both criteria are discussed for nonstagewise and stagewise procedures in case of independent test statistics. For the example of 5 independently and normally distributed test statistics the values of the two criteria are calcutated under reasonably simple alternatives.
Article
Simes (1986) has proposed a modified Bonferroni procedure for the test of an overall hypothesis which is the combination of n individual hypotheses. In contrast to the classical Bonferroni procedure, it is not obvious how statements about individual hypotheses are to be made for this procedure. In the present paper a multiple test procedure allowing statements on individual hypotheses is proposed. It is based on the principle of closed test procedures (Marcus, Peritz & Gabriel, 1976) and controls the multiple level α.
Article
This paper presents a simple and widely ap- plicable multiple test procedure of the sequentially rejective type, i.e. hypotheses are rejected one at a tine until no further rejections can be done. It is shown that the test has a prescribed level of significance protection against error of the first kind for any combination of true hypotheses. The power properties of the test and a number of possible applications are also discussed.
Article
Optimality criteria formulated in terms of the power functions of the individual tests are given for problems where several hypotheses are tested simultaneously. Subject to the constraint that the expected number of false rejections is less than a given constant γ\gamma when all null hypotheses are true, tests are found which maximize the minimum average power and the minimum power of the individual tests over certain alternatives. In the common situations in the analysis of variance this leads to application of multiple t-tests. In that case the resulting procedure is to use Fisher's "least significant difference," but without a preliminary F-test and with a smaller level of significance. Recommendations for choosing the value of γ\gamma are given by relating γ\gamma to the probability of no false rejections if all hypotheses are true. Based upon the optimality of the tests, a similar optimality property of joint confidence sets is also derived.
Article
SUMMARY A method of devising stepwise multiple testing procedures with fixed experimentwise error is presented. The method requires the set of hypotheses tested to be closed under intersection. The method is applied to the problem of comparing many treatments to one control and to ordered analysis of variance.
Statistical methods for research workers Multiple comparison procedures
  • R A Fisher
  • Oliver
  • Boyd
  • London
  • Y Hochberg
  • A J Tamhane
Erweiterung klassischer Kombinationstests zur Identifikation von Alter-nativhypothesen
  • W Maurer
  • G Hommel
Zusammenfassen unabhängiger Experimente
  • E Sonnemann
Statistical methods for research workers
  • R A Fisher
  • Oliver
  • London Boyd
  • Y Hochberg
  • A J Tamhane
  • RA Fisher
The methods of statistics
  • L H G Tippett
  • LHG Tippett