Hierachical testing of multiple endpoints in group sequential trials
We consider the situation of testing hierarchically a (key) secondary endpoint in a group-sequential clinical trial that is mainly driven by a primary endpoint. By 'mainly driven', we mean that the interim analyses are planned at points in time where a certain number of patients or events have accrued on the primary endpoint, and the trial will run either until statistical significance of the primary endpoint is achieved at one of the interim analyses or to the final analysis. We consider both the situation where the trial is stopped as soon as the primary endpoint is significant as well as the situation where it is continued after primary endpoint significance to further investigate the secondary endpoint. In addition, we investigate how to achieve strong control of the familywise error rate (FWER) at a pre-specified significance level alpha for both the primary and the secondary hypotheses. We systematically explore various multiplicity adjustment methods. Starting point is a naive strategy of testing the secondary endpoint at level alpha whenever the primary endpoint is significant. Hung et al. (J. Biopharm. Stat. 2007; 17:1201-1210) have already shown that this naive strategy does not maintain the FWER at level alpha. We derive a sharp upper bound for the rejection probability of the secondary endpoint in the naive strategy. This suggests a number of multiple test strategies and also provides a benchmark for deciding whether a method is conservative or might be improved while maintaining the FWER at alpha. We use a numerical example based on a real case study to illustrate the results of different hierarchical test strategies.