Hierachical testing of multiple endpoints in group sequential trials
Novartis Pharma AG, Statistical Methodology, Novartis Campus, CH-4056 Basel, Switzerland.Statistics in Medicine (Impact Factor: 1.83). 01/2009; 29(2):219-28. DOI: 10.1002/sim.3748
We consider the situation of testing hierarchically a (key) secondary endpoint in a group-sequential clinical trial that is mainly driven by a primary endpoint. By 'mainly driven', we mean that the interim analyses are planned at points in time where a certain number of patients or events have accrued on the primary endpoint, and the trial will run either until statistical significance of the primary endpoint is achieved at one of the interim analyses or to the final analysis. We consider both the situation where the trial is stopped as soon as the primary endpoint is significant as well as the situation where it is continued after primary endpoint significance to further investigate the secondary endpoint. In addition, we investigate how to achieve strong control of the familywise error rate (FWER) at a pre-specified significance level alpha for both the primary and the secondary hypotheses. We systematically explore various multiplicity adjustment methods. Starting point is a naive strategy of testing the secondary endpoint at level alpha whenever the primary endpoint is significant. Hung et al. (J. Biopharm. Stat. 2007; 17:1201-1210) have already shown that this naive strategy does not maintain the FWER at level alpha. We derive a sharp upper bound for the rejection probability of the secondary endpoint in the naive strategy. This suggests a number of multiple test strategies and also provides a benchmark for deciding whether a method is conservative or might be improved while maintaining the FWER at alpha. We use a numerical example based on a real case study to illustrate the results of different hierarchical test strategies.
[Show abstract] [Hide abstract]
- "We assume that each sampled unit i contributes to the total cost of an experiment regardless of how many components X ij (such as vital signs of patients or electronic measurements of manufactured parts) are recorded on unit i. This is quite common in many experiments (e.g.,    ). For example, in clinical trials, certain amount is budgeted for each participating patient, covering the cost of a treatment, service, insurance, incentive, and possibly, accommodation and transportation. "
ABSTRACT: Sequential methods are developed for conducting a large number of simultaneous tests while controlling the Type I and Type II generalized familywise error rates. Namely, for the chosen values of , , , and , we derive simultaneous tests of individual hypotheses, based on sequentially collected data, that keep the probability of at least Type I errors not exceeding level and the probability of at least Type II errors not greater than . This generalization of the classical notions of familywise error rates allows substantial reduction of the expected sample size of the multiple testing procedure.
- [Show abstract] [Hide abstract]
ABSTRACT: We consider a clinical trial with a primary and a secondary endpoint where the secondary endpoint is tested only if the primary endpoint is significant. The trial uses a group sequential procedure with two stages. The familywise error rate (FWER) of falsely concluding significance on either endpoint is to be controlled at a nominal level α. The type I error rate for the primary endpoint is controlled by choosing any α-level stopping boundary, e.g., the standard O'Brien-Fleming or the Pocock boundary. Given any particular α-level boundary for the primary endpoint, we study the problem of determining the boundary for the secondary endpoint to control the FWER. We study this FWER analytically and numerically and find that it is maximized when the correlation coefficient ρ between the two endpoints equals 1. For the four combinations consisting of O'Brien-Fleming and Pocock boundaries for the primary and secondary endpoints, the critical constants required to control the FWER are computed for different values of ρ. An ad hoc boundary is proposed for the secondary endpoint to address a practical concern that may be at issue in some applications. Numerical studies indicate that the O'Brien-Fleming boundary for the primary endpoint and the Pocock boundary for the secondary endpoint generally gives the best primary as well as secondary power performance. The Pocock boundary may be replaced by the ad hoc boundary for the secondary endpoint with a very little loss of secondary power if the practical concern is at issue. A clinical trial example is given to illustrate the methods.
- [Show abstract] [Hide abstract]
ABSTRACT: In the midst of gaining more experience in pursuing scientifically sound approaches of adaptive designs in clinical trials, a panel discussion with international representatives from industry, academia, and regulatory agencies was held at the Basel Biometric Society Spring Conference, March 12, 2010. The goal was to develop some consensus among industry, government, and academic statisticians concerning requirements and methods for adaptive designs in clinical trials. In this paper, we summarize the panelists' perspectives given at that session.