Hierarchical testing of multiple endpoints in group-sequential trials
ABSTRACT We consider the situation of testing hierarchically a (key) secondary endpoint in a group-sequential clinical trial that is mainly driven by a primary endpoint. By 'mainly driven', we mean that the interim analyses are planned at points in time where a certain number of patients or events have accrued on the primary endpoint, and the trial will run either until statistical significance of the primary endpoint is achieved at one of the interim analyses or to the final analysis. We consider both the situation where the trial is stopped as soon as the primary endpoint is significant as well as the situation where it is continued after primary endpoint significance to further investigate the secondary endpoint. In addition, we investigate how to achieve strong control of the familywise error rate (FWER) at a pre-specified significance level alpha for both the primary and the secondary hypotheses. We systematically explore various multiplicity adjustment methods. Starting point is a naive strategy of testing the secondary endpoint at level alpha whenever the primary endpoint is significant. Hung et al. (J. Biopharm. Stat. 2007; 17:1201-1210) have already shown that this naive strategy does not maintain the FWER at level alpha. We derive a sharp upper bound for the rejection probability of the secondary endpoint in the naive strategy. This suggests a number of multiple test strategies and also provides a benchmark for deciding whether a method is conservative or might be improved while maintaining the FWER at alpha. We use a numerical example based on a real case study to illustrate the results of different hierarchical test strategies.
- SourceAvailable from: Cyrus R. Mehta[Show abstract] [Hide abstract]
ABSTRACT: We consider a clinical trial with a primary and a secondary endpoint where the secondary endpoint is tested only if the primary endpoint is significant. The trial uses a group sequential procedure with two stages. The familywise error rate (FWER) of falsely concluding significance on either endpoint is to be controlled at a nominal level α. The type I error rate for the primary endpoint is controlled by choosing any α-level stopping boundary, e.g., the standard O'Brien-Fleming or the Pocock boundary. Given any particular α-level boundary for the primary endpoint, we study the problem of determining the boundary for the secondary endpoint to control the FWER. We study this FWER analytically and numerically and find that it is maximized when the correlation coefficient ρ between the two endpoints equals 1. For the four combinations consisting of O'Brien-Fleming and Pocock boundaries for the primary and secondary endpoints, the critical constants required to control the FWER are computed for different values of ρ. An ad hoc boundary is proposed for the secondary endpoint to address a practical concern that may be at issue in some applications. Numerical studies indicate that the O'Brien-Fleming boundary for the primary endpoint and the Pocock boundary for the secondary endpoint generally gives the best primary as well as secondary power performance. The Pocock boundary may be replaced by the ad hoc boundary for the secondary endpoint with a very little loss of secondary power if the practical concern is at issue. A clinical trial example is given to illustrate the methods.Biometrics 03/2010; 66(4):1174-84. DOI:10.1111/j.1541-0420.2010.01402.x · 1.52 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: In the midst of gaining more experience in pursuing scientifically sound approaches of adaptive designs in clinical trials, a panel discussion with international representatives from industry, academia, and regulatory agencies was held at the Basel Biometric Society Spring Conference, March 12, 2010. The goal was to develop some consensus among industry, government, and academic statisticians concerning requirements and methods for adaptive designs in clinical trials. In this paper, we summarize the panelists' perspectives given at that session.Journal of Biopharmaceutical Statistics 11/2010; 20(6):1098-112. DOI:10.1080/10543406.2010.514447 · 0.72 Impact Factor
- [Show abstract] [Hide abstract]
ABSTRACT: Multiple testing problems are complex in evaluating statistical evidence in pivotal clinical trials for regulatory applications. However, a common practice is to employ a general and rather simple multiple comparison procedure to handle the problems. Applying multiple comparison adjustments is to ensure proper control of type I error rates. However, in many practices, the emphasis of the type I error rate control often leads to a choice of a statistically valid multiple test procedure but the common sense is overlooked. The challenges begin with confusions in defining a relevant family of hypotheses for which the type I error rates need to be properly controlled. Multiple testing problems are in a wide variety, ranging from testing multiple doses and endpoints jointly, composite endpoint, non-inferiority and superiority, to studying time of onset of a treatment effect, and searching for minimum effective dose or a patient subgroup in which the treatment effect lies. To select a valid and sensible multiple test procedure, the first step should be to tailor the selection to the study questions and to the ultimate clinical decision tree. Then evaluation of statistical power performance should come in to play in the next step to fine tune the selected procedure.Biometrical Journal 12/2010; 52(6):747-56. DOI:10.1002/bimj.200900206 · 1.24 Impact Factor