Clinical Trials: Discerning Hype From Substance

ArticleinAnnals of internal medicine 153(6):400-6 · September 2010with9 Reads
DOI: 10.1059/0003-4819-153-6-201009210-00008 · Source: PubMed
The interest in being able to interpret and report results in clinical trials as being favorable is pervasive throughout health care research. This important source of bias needs to be recognized, and approaches need to be implemented to effectively address it. The prespecified primary analyses of the primary and secondary end points of a clinical trial should be clearly specified when disseminating results in press releases and journal publications. There should be a focus on these analyses when interpreting the results. A substantial risk for biased conclusions is produced by conducting exploratory analyses with an intention to establish that the benefit-to-risk profile of the experimental intervention is favorable, rather than to determine whether it is. In exploratory analyses, P values will be misleading when the actual sampling context is not presented to allow for proper interpretation, and the effect sizes of outcomes having particularly favorable estimates are probably overestimated because of "random high" bias. Performing exploratory analyses should be viewed as generating hypotheses that usually require reassessment in prospectively conducted confirmatory trials. Awareness of these issues will meaningfully improve our ability to be guided by substance, not hype, in making evidence-based decisions about medical care.
    • "Response adaptive trial design could decrease the risk of increased mortality only after a large sample size has been evaluated (27) by adjusting randomization to the intervention or control as the RCT progresses based on ongoing interim analyses. Some believe that Phase III mortality RCTs that stop early for efficacy overestimate treatment efficacy [27]. Most RCTs had secondary results that might explain higher mortality rates of intervention compared to control groups such as increased cardiovascular complications (tachyarrhythmias possibly caused by the interventions in two RCTs (dobutamine [10] ; intravenous salbutamol infusion [9]), decreased cardiac output by excessive vasoconstriction due to NOS inhibition [11] or by decreased venous return secondary to highfrequency oscillation ventilation [16]), renal toxicity (hetastarch increased risk of acute kidney injury [12, 18]) and hypoglycemia (intensive insulin had significantly increased severe hypoglycemia [8]). "
    [Show abstract] [Hide abstract] ABSTRACT: Several promising therapies assessed in the adult critically ill in large, multicenter randomized controlled trials (RCTs) were associated with significantly increased mortality in the intervention arms. Our hypothesis was that there would be wide ranges in sponsorship (industry or not), type(s) of intervention(s), use of DSMBs, presence of interim analyses and early stopping rules, absolute risk increase (ARI), and whether or not adequate prior proof-of-principle Phase II studies were done of RCTs that found increased mortality rates of the intervention compared to control groups. We reviewed RCTs that showed a statistically significant increased mortality rate in the intervention compared to control group(s). We recorded source of sponsorship, sample sizes, types of interventions, mortality rates, ARI (as well as odds ratios, relative risks and number needed to harm), whether there were pre-specified interim analyses and early stopping rules, and whether or not there were prior proof-of-principle (also known as Phase II) RCTs. Ten RCTs (four industry sponsored) of many interventions (high oxygen delivery, diaspirin cross-linked hemoglobin, growth hormone, methylprednisolone, hetastarch, high-frequency oscillation ventilation, intensive insulin, NOS inhibition, and beta-2 adrenergic agonist, TNF-α receptor) included 19,126 patients and were associated with wide ranges of intervention versus control group mortality rates (25.7-59 %, mean 29.9 vs 17-49 %, mean 25 %, respectively) yielding ARIs of 2.6-29 % (mean 5 %). All but two RCTs had pre-specified interim analyses, and seven RCTs were stopped early. All RCTs were preceded by published proof-of-principle RCT(s), two by the same group. Seven interventions (except diaspirin cross-linked hemoglobin and the NOS inhibitor) were available for use clinically at the time of the pivotal RCT. Common, clinically available interventions used in the critically ill were associated with increased mortality in large, pivotal RCTs even though safety was often addressed by interim analyses and early stopping rules.
    Full-text · Article · Dec 2016
    • "Such selectively reported treatment effect estimates will be biased because one may select the extreme estimates. This is an instance of " random high " bias, a phenomenon closely related to regression to the mean (Fleming, 2010). It is obvious that this practice can lead to overestimation of the difference between treatments within the subgroups reported. "
    [Show abstract] [Hide abstract] ABSTRACT: Treatment effect heterogeneity is a well-recognized phenomenon in randomized controlled clinical trials. In this paper, we discuss subgroup analyses with prespecified subgroups of clinical or biological importance. We explore various alternatives to the naive (the traditional univariate) subgroup analyses to address the issues of multiplicity and confounding. Specifically, we consider a model-based Bayesian shrinkage (Bayes-DS) and a nonparametric, empirical Bayes shrinkage approach (Emp-Bayes) to temper the optimism of traditional univariate subgroup analyses; a standardization approach (standardization) that accounts for correlation between baseline covariates; and a model-based maximum likelihood estimation (MLE) approach. The Bayes-DS and Emp-Bayes methods model the variation in subgroup-specific treatment effect rather than testing the null hypothesis of no difference between subgroups. The standardization approach addresses the issue of confounding in subgroup analyses. The MLE approach is considered only for comparison in simulation studies as the "truth" since the data were generated from the same model. Using the characteristics of a hypothetical large outcome trial, we perform simulation studies and articulate the utilities and potential limitations of these estimators. Simulation results indicate that Bayes-DS and Emp-Bayes can protect against optimism present in the naïve approach. Due to its simplicity, the naïve approach should be the reference for reporting univariate subgroup-specific treatment effect estimates from exploratory subgroup analyses. Standardization, although it tends to have a larger variance, is suggested when it is important to address the confounding of univariate subgroup effects due to correlation between baseline covariates. The Bayes-DS approach is available as an R package (DSBayes).
    Full-text · Article · Oct 2015
    • "whether reported as grade or as pass/fail. With the usual call for caution that conclusions based on subset analyses should be made tenuously (Fleming 2010), we did find consistent results (for both grade and pass/fail) that males benefited from DE more than females. We also found evidence that White students were more likely to benefit from DE than multiracial students (the estimate of benefit, .25, "
    [Show abstract] [Hide abstract] ABSTRACT: Annually, American colleges and universities provide developmental education (DE) to millions of underprepared students; however, evaluation estimates of DE benefits have been mixed. Using a prototypic exemplar of DE, our primary objective was to investigate the utility of a replicative evaluative framework for assessing program effectiveness. Within the context of the regression discontinuity (RD) design, this research examined the effectiveness of a DE program for five, sequential cohorts of first-time college students. Discontinuity estimates were generated for individual terms and cumulatively, across terms. Participants were 3,589 first-time community college students. DE program effects were measured by contrasting both college-level English grades and a dichotomous measure of pass/fail, for DE and non-DE students. Parametric and nonparametric estimates of overall effect were positive for continuous and dichotomous measures of achievement (grade and pass/fail). The variability of program effects over time was determined by tracking results within individual terms and cumulatively, across terms. Applying this replication strategy, DE's overall impact was modest (an effect size of approximately .20) but quite consistent, based on parametric and nonparametric estimation approaches. A meta-analysis of five RD results yielded virtually the same estimate as the overall, parametric findings. Subset analysis, though tentative, suggested that males benefited more than females, while academic gains were comparable for different ethnicities. The cumulative, within-study comparison, replication approach offers considerable potential for the evaluation of new and existing policies, particularly when effects are relatively small, as is often the case in applied settings.
    Full-text · Article · Mar 2014
Show more

  • undefined · undefined
  • undefined · undefined
  • undefined · undefined

Recommended publications

Discover more