Hierachical testing of multiple endpoints in group sequential trials

ArticleinStatistics in Medicine 29(2):219-28 · January 2009with477 Reads
DOI: 10.1002/sim.3748 · Source: PubMed
We consider the situation of testing hierarchically a (key) secondary endpoint in a group-sequential clinical trial that is mainly driven by a primary endpoint. By 'mainly driven', we mean that the interim analyses are planned at points in time where a certain number of patients or events have accrued on the primary endpoint, and the trial will run either until statistical significance of the primary endpoint is achieved at one of the interim analyses or to the final analysis. We consider both the situation where the trial is stopped as soon as the primary endpoint is significant as well as the situation where it is continued after primary endpoint significance to further investigate the secondary endpoint. In addition, we investigate how to achieve strong control of the familywise error rate (FWER) at a pre-specified significance level alpha for both the primary and the secondary hypotheses. We systematically explore various multiplicity adjustment methods. Starting point is a naive strategy of testing the secondary endpoint at level alpha whenever the primary endpoint is significant. Hung et al. (J. Biopharm. Stat. 2007; 17:1201-1210) have already shown that this naive strategy does not maintain the FWER at level alpha. We derive a sharp upper bound for the rejection probability of the secondary endpoint in the naive strategy. This suggests a number of multiple test strategies and also provides a benchmark for deciding whether a method is conservative or might be improved while maintaining the FWER at alpha. We use a numerical example based on a real case study to illustrate the results of different hierarchical test strategies.
    • "We assume that each sampled unit i contributes to the total cost of an experiment regardless of how many components X ij (such as vital signs of patients or electronic measurements of manufactured parts) are recorded on unit i. This is quite common in many experiments (e.g., [4] [11] [15] [26]). For example, in clinical trials, certain amount is budgeted for each participating patient, covering the cost of a treatment, service, insurance, incentive, and possibly, accommodation and transportation. "
    [Show abstract] [Hide abstract] ABSTRACT: Sequential methods are developed for conducting a large number of simultaneous tests while controlling the Type I and Type II generalized familywise error rates. Namely, for the chosen values of , , , and , we derive simultaneous tests of individual hypotheses, based on sequentially collected data, that keep the probability of at least Type I errors not exceeding level and the probability of at least Type II errors not greater than . This generalization of the classical notions of familywise error rates allows substantial reduction of the expected sample size of the multiple testing procedure.
    Article · Mar 2015
    • "We studied various combinations of θ 1 and θ 2 for different choices of the correlation ρ between the two endpoints, but here we report the results only for ρ = 0.5 since the best choice of r was found to be relatively insensitive to ρ although E (N ) and M both depend on ρ. Tamhane et al. (2010) and Glimm et al. (2010) have shown that different combinations of boundaries also affect the performance of a GSP. We considered two combinations of boundaries for H 1 and H 2 : OBF-OBF and OBF-POC. "
    [Show abstract] [Hide abstract] ABSTRACT: Graphical approaches have been proposed in the literature for testing hypotheses on multiple endpoints by recycling significance levels from rejected hypotheses to unrejected ones. Recently, they have been extended to group sequential procedures (GSPs). Our focus in this paper is on the allocation of recycled significance levels from rejected hypotheses to the stages of the GSPs for unrejected hypotheses. We propose a delayed recycling method that allocates the recycled significance level from Stage r onward, where r is prespecified. We show that r cannot be chosen adaptively to coincide with the random stage at which the hypothesis from which the significance level is recycled is rejected. Such an adaptive GSP does not always control the FWER. One can choose r to minimize the expected sample size for a given power requirement. We illustrate how a simulation approach can be used for this purpose. Several examples, including a clinical trial example, are given to illustrate the proposed procedure.
    Article · Jan 2015
    • "No consensus could be reached on how best to address the multiplicity issues associated with multiple secondary endpoints that may be used to make formal statements or claims in the label when a study stops at an interim analysis due to overwhelming efficacy based on the primary endpoint. Should the p-values be adjusted as per the primary endpoint and only those that are significant be used to make claims in the label, for example, by utilising the principles outlined in the recent research [11, 12]? The methodology would need to be extended to address all key endpoints needed for the label. "
    [Show abstract] [Hide abstract] ABSTRACT: In May 2012, the Committee of Health and Medicinal Products issued a concept paper on the need to review the points to consider document on multiplicity issues in clinical trials. In preparation for the release of the updated guidance document, Statisticians in the Pharmaceutical Industry held a one-day expert group meeting in January 2013. Topics debated included multiplicity and the drug development process, the usefulness and limitations of newly developed strategies to deal with multiplicity, multiplicity issues arising from interim decisions and multiregional development, and the need for simultaneous confidence intervals (CIs) corresponding to multiple test procedures. A clear message from the meeting was that multiplicity adjustments need to be considered when the intention is to make a formal statement about efficacy or safety based on hypothesis tests. Statisticians have a key role when designing studies to assess what adjustment really means in the context of the research being conducted. More thought during the planning phase needs to be given to multiplicity adjustments for secondary endpoints given these are increasing in importance in differentiating products in the market place. No consensus was reached on the role of simultaneous CIs in the context of superiority trials. It was argued that unadjusted intervals should be employed as the primary purpose of the intervals is estimation, while the purpose of hypothesis testing is to formally establish an effect. The opposing view was that CIs should correspond to the test decision whenever possible. Copyright © 2013 John Wiley & Sons, Ltd.
    Full-text · Article · Sep 2013
Show more