Article

Design Archetypes for Phase 2 Clinical Trials in Central Nervous System Disorders

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

An overarching framework is proposed to guide the design of phase 2 studies in central nervous system disorders. Archetypes are considered for scenarios where dose response is highly relevant in clinical practice, as in the symptomatic treatment of acute disorders. Archetypes for scenarios where dose response is less relevant, as in disease modification for neurodegenerative disorders, are beyond the scope of this article. Primary design archetypes are determined by axes of development that are defined by optimism for success (probability of efficacy) and signal detection (magnitude of the anticipated effect size). The fast-to-registration primary archetype uses a dose-response study as the first efficacy, that is, proof of concept (PoC), study and is appropriate when the prospects for signal detection and the optimism for efficacy are higher. These conditions may exist when the anticipated effect size is large and when either testing a drug with a proven mechanism of action or when a favorable biomarker result was obtained in phase 1. The fast-to-PoC primary archetype tests one dose arm to establish PoC before assessing dose response and is appropriate when the optimism for efficacy and the prospects for signal detection are lower. These conditions may exist when testing a drug with a novel mechanism and/or the anticipated effect size is smaller. Secondary archetypes are used to mitigate the trade-offs between the quick-kill fast-to-PoC approach and the quick-win fast-to-registration approach, and are key areas where adaptive designs can be beneficial.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Therefore, it is not surprising that many articles and textbooks have been devoted to these topics; see, for example, Piantadosi (2005). However, developing a drug involves a series of studies, and optimizing each individual trial in that series does not necessarily optimize the series (Mallinckrodt et al., 2010). Moreover, most drug approvals come from large companies that have many compounds in development (Munos, 2009). ...
... For this exercise, industry average costs of $40 million for Phase II and $150 million for Phase III were taken from Paul et al. (2010). Two development paradigms were considered: the so-called fast to PoC and the fast to registration paradigms discussed by Mallinckrodt et al. (2010). In the fast to PoC approach, investments are minimized prior to establishing proof of concept. ...
... Nevertheless, these examples illustrated that the optimum choice for and can vary depending on compound attributes, such as the probability the drug is effective, and on the development approach (fast to PoC vs. fast to registration). The results also illustrate how each of the two development paradigms, fast to PoC and fast to registration, may be optimal given differing compound attributes, as pointed out by Mallinckrodt et al. (2010). Downloaded by [KU Leuven University Library] at 02:01 23 July 2014 ...
Article
Full-text available
Improving proof-of-concept (PoC) studies is a primary lever for improving drug development. Since drug development is often done by institutions that work on multiple drugs simultaneously, the present work focused on optimum choices for rates of false positive (α) and false negative (β) results across a portfolio of PoC studies. Simple examples and a newly derived equation provided conceptual understanding of basic principles regarding optimum choices of α and β in PoC trials. In examples that incorporated realistic development costs and constraints, the levels of α and β that maximized the number of approved drugs and portfolio value varied by scenario. Optimum choices were sensitive to the probability the drug was effective and to the proportion of total investment cost prior to establishing PoC. Results of the present investigation agree with previous research in that it is important to assess optimum levels of α and β. However, the present work also highlighted the need to consider cost structure using realistic input parameters relevant to the question of interest.
Article
Full-text available
The construction of a depression rating scale designed to be particularly sensitive to treatment effects is described. Ratings of 54 English and 52 Swedish patients on a 65 item comprehensive psychopathology scale were used to identify the 17 most commonly occurring symptoms in primary depressive illness in the combined sample. Ratings on these 17 items for 64 patients participating in studies of four different antidepressant drugs were used to create a depression scale consisting of the 10 items which showed the largest changes with treatment and the highest correlation to overall change. The inner-rater reliability of the new depression scale was high. Scores on the scale correlated significantly with scores on a standard rating scale for depression, the Hamilton Rating Scale (HRS), indicating its validity as a general severity estimate. Its capacity to differentiate between responders and non-responders to antidepressant treatment was better than the HRS, indicating greater sensitivity to change. The practical and ethical implications in terms of smaller sample sizes in clinical trials are discussed.
Article
Full-text available
The authors examined which, if any, research design features and patient characteristics would significantly differ between successful and unsuccessful antidepressant trials. Clinical trial data were reviewed for nine antidepressants approved by the Food and Drug Administration between 1985 and 2000. From the antidepressant research programs on these medications, 52 clinical trials were included in the study. The authors evaluated trial design features, patient characteristics, and difference in response between placebo and antidepressant. Nine trial design features and patient characteristics were present in the research programs for all nine of the antidepressants. The severity of depressive symptoms before patient randomization, the dosing schedule (flexible versus fixed), the number of treatment arms, and the percentage of female patients were significantly associated with the difference in response to antidepressant and placebo. The duration of the antidepressant trial, number of patients per treatment arm, number of sites, and mean age of the patients were similar in successful trials (with a greater antidepressant-placebo difference) and less successful trials (with a smaller antidepressant-placebo difference). These findings may help in the design of future antidepressant trials.
Article
Full-text available
The development of new antidepressant drugs has reached a plateau. There is an unmet need for faster, better, and safer medications, but as placebo-response rates rise, effect sizes shrink, and more studies fail or are negative, pharmaceutical companies are increasingly reluctant to invest in new drug development because of the risk of failure. In the absence of an identifiable human pathophysiology that can be modeled in preclinical studies, the principal point of leverage to move beyond the present dilemma may be improving the information gleaned from well-designed proof-of-concept (POC) studies of new antidepressant drugs with novel central nervous system effects. With this in mind, a group of experts was convened under the auspices of the University of Arizona Department of Psychiatry and Best Practice Project Management, Inc. Forty-five experts in the study of antidepressant drugs from academia, government (U.S. Food and Drug Administration and National Institute of Mental Health), and industry participated. EVIDENCE/CONSENSUS PROCESS: In order to define the state of clinical trials methodology in the antidepressant area, and to chart a way forward, a 2-day consensus conference was held June 21-22, 2007, in Bethesda, Md., at which careful reviews of the literature were presented for discussion. Following the presentations, participants were divided into 3 workgroups and asked to address a series of separate questions related to methodology in POC studies. The goals were to review the history of antidepressant drug trials, discuss ways to improve study design and data analysis, and plan more informative POC studies. The participants concluded that the federal government, academic centers, and the pharmaceutical industry need to collaborate on establishing a network of sites at which small, POC studies can be conducted and resulting data can be shared. New technologies to analyze and measure the major affective, cognitive, and behavioral components of depression in relationship to potential biomarkers of response should be incorporated. Standard assessment instruments should be employed across studies to allow for future meta-analyses, but new instruments should be developed to differentiate subtypes and symptom clusters within the disorder that might respond differently to treatment. Better early-stage POC studies are needed and should be able to amplify the signal strength of drug efficacy and enhance the quality of information in clinical trials of new medications with novel pharmacologic profiles.
Article
Full-text available
On September 18, 2007, a collaborative session between the International Society for CNS Clinical Trials and Methodology and the International Society for CNS Drug Development was held in Brussels, Belgium. Both groups, with membership from industry, academia, and governmental and nongovernmental agencies, have been formed to address scientific, clinical, regulatory, and methodological challenges in the development of central nervous system therapeutic agents. The focus of this joint session was the apparent diminution of drug-placebo differences in recent multicenter trials of antipsychotic medications for schizophrenia. To characterize the nature of the problem, some presenters reported data from several recent trials that indicated higher rates of placebo response and lower rates of drug response (even to previously established, comparator drugs), when compared with earlier trials. As a means to identify the possible causes of the problem, discussions covered a range of methodological factors such as participant characteristics, trial designs, site characteristics, clinical setting (inpatient vs outpatient), inclusion/exclusion criteria, and diagnostic specificity. Finally, possible solutions were discussed, such as improving precision of participant selection criteria, improving assessment instruments and/or assessment methodology to increase reliability of outcome measures, innovative methods to encourage greater subject adherence and investigator involvement, improved rater training and accountability metrics at clinical sites to increase quality assurance, and advanced methods of pharmacokinetic/pharmacodynamic modeling to optimize dosing prior to initiating large phase 3 trials. The session closed with a roundtable discussion and recommendations for data sharing to further explore potential causes and viable solutions to be applied in future trials.
Article
Full-text available
Some studies suggest that more severely ill patients with depression respond well to antidepressants and poorly to placebo, whereas those who are mildly ill respond equally well to antidepressants and placebo. This notion has implications for the design of clinical trials. To further assess and substantiate these putative predictors of antidepressant and placebo response, we assessed the Food and Drug Administration database of 45 phase II and III antidepressant clinical trials. The frequency of statistically significant differences between antidepressants and placebo was higher in the trials that included patients with more severe depression. In the antidepressant-treated groups, the magnitude of symptom reduction was significantly related to mean initial Hamilton Rating Scale for Depression (HAM-D) score; the higher the mean initial HAM-D score, the larger the change. With placebo treatment, however, the higher the mean initial HAM-D score, the smaller the change. Early discontinuation was more frequent among patients whose mean initial HAM-D scores were higher. These data may help inform the design of future antidepressant clinical trials.
Article
Full-text available
Although early antidepressant clinical trials simply relied on a clinician's judgment as to whether a depressed patient clinically improved or not, the Hamilton Depression (HAM-D) rating scale has become the 'gold standard' to assess the efficacy of new antidepressants. The alternative Montgomery-Asberg Depression Rating Scale (MADRS) has not achieved general acceptance. However, its ease of use warrants evaluation as to whether it is comparable to HAM-D in its sensitivity in detecting antidepressant-placebo differences in antidepressant clinical trials. A retrospective chart review was performed on the records of 208 depressed adult patients that participated in eight randomized, placebo-controlled, double-blind antidepressant clinical trials at the Northwest Clinical Research Center between 1996 and 2000. We compared the effect sizes of the HAM-D, MADRS and Clinical Impressions Rating Scale (CGI-S for severity and CGI-I for improvement) for patients assigned to placebo or an established antidepressant. The effect size (measured as the mean change in rating with antidepressants minus the mean change for placebo divided by the pooled SD of change, adjusted for age, gender and initial scores) was 0.49 with MADRS, 0.53 with HAM-D, 0.55 with CGI-S and 0.59 with CGI-I. The four rating scales had similar effect sizes regardless of the type of antidepressant evaluated. These data suggest that MADRS is as sensitive an instrument as HAM-D for detecting antidepressant efficacy in clinical trials. Thus, MADRS may be a desirable tool in large-scale, pivotal antidepressant clinical trials.
Article
Full-text available
Placebo response magnitude is suspected to affect the outcome of antidepressant clinical trials. To evaluate this, 52 randomized, double-blind, placebo-controlled clinical trials obtained from the FDA were examined to correlate placebo response magnitude with trial outcome. The magnitude of symptom reduction, percentage mean change from baseline in the Hamilton Depression Rating Scale (HAM-D), was assessed for patients assigned to placebo or an antidepressant. Correlation coefficients between symptom reduction with placebo and antidepressants and between symptom reduction with placebo and magnitude of advantage of antidepressants over placebo were assessed. A statistically significant positive correlation was seen between placebo and antidepressant response magnitude (r =.40, p <.001) and between placebo response magnitude and the advantage of antidepressants over placebo (r = -.592, p <.0001). Only 21.1% of antidepressant treatment arms in trials with high placebo response (>30% mean change from baseline) showed statistical superiority over placebo compared with 74.2% in trials with a low placebo response (< or =30). Response magnitude varies and has an important effect on antidepressant clinical trials, illustrating the need for a placebo arm to determine if the trial was sensitive to treatment differences and highlighting the dangers of cross-study comparisons.
Article
Full-text available
The placebo response is a major issue in clinical trials for psychiatric disorders. Possible contributing factors to this problem include diagnostic misclassification, issues concerning inclusion/exclusion criteria, outcome measures' lack of sensitivity to change, measurement errors, poor quality of data entry and verification, waxing and waning of the natural course of illness, regression toward the mean phenomenon, patient and clinician expectations about the trial, study design issues, non-specific therapeutic effects, and high attrition. Over the past few decades, researchers have attempted to reduce the placebo effect in a variety of ways. Unfortunately, approaches with very little or no benefit have included restricting enrollment to selected populations, rater training, requirement of same rater, and placebo lead-in phases. Some benefits, although often marginal, have been derived from standardizing diagnostic procedures, managing clinicians' overestimation of change, simplification of study visits and assessments, minimizing non-specific, therapeutic effects, extending trial duration, reducing number of sites, increasing the sensitivity of outcome measures, and reducing the number of treatment arms. Thus far, there has been no attempt to develop new study designs aimed at reducing the placebo effect. We are proposing a novel study design, called 'Sequential Parallel Comparison Design', suitable for double-blind, placebo-controlled trials in psychiatric disorders. This design is aimed at reducing both the overall placebo response rate and the sample size required for such trials. Its usefulness in clinical research needs to be tested empirically. If this study design were to be found to meet its stated goals, this could markedly facilitate the process of clinical development of new compounds for the treatment of psychiatric disorders.
Article
Full-text available
We assessed whether increasing the minimum prerandomization Hamilton Depression Rating Scale (HAM-D) score to enrich the severity of the depressed sample affects antidepressant trial outcome. Using the Food and Drug Administration Summary Basis of Approval reports, we examined outcome data from 51 clinical trials (11,270 depressed patients) evaluating 10 investigational antidepressants. Using four categories of trials with increasing minimum HAM-D entry trial criteria, we found no statistically significant relationship between prerandomization categories and trial outcome overall. Although there were minor differences in trial outcome among the three categories with the lowest entry criteria (mean 49%, range, 44.4%-50.0%), the antidepressant trials requiring the highest prerandomization HAM-D score (> or = 20 HAM-D 17) had the lowest frequency of positive outcomes (20%), chi(2) = 4.04, df =1, p = .04. Paradoxically, high entry criteria requirements failed to increase reliably actual mean total prerandomization HAM-D scores, although mean total prerandomization HAM-D scores and use of flexible dosing were associated with higher rates of positive outcome. A greater placebo response was seen in trials requiring higher prerandomization depressive symptoms. In summary, requiring higher prerandomization depressive symptoms was not associated with an increased rate of favorable outcomes among these 51 antidepressant trials.
Article
Full-text available
Objective: Previous experience with antidepressant studies highlight the difficulties in discriminating an effective drug from placebo. In hopes of improving signal detection, three easy-to-implement methodologies were employed during the development of a recently approved antidepressant. Experimental design: Results from alternative and traditional methods could be compared directly because most studies employed both methods. This database included 11 double-blind, placebo-controlled trials (some with multiple dose arms and/or active comparators) yielding 22 treatment arms of antidepressants at or above the minimally effective dose noted in their U.S. labels. Principal observations: Results agreed with the previous evidence showing that the performance of a likelihood-based, mixed-effects model repeated measures (MMRM) analysis was superior to that of analysis of covariance with missing values imputed using the last observation carried forward (LOCF) approach; MMRM correctly identified drug as superior to placebo in 14/22 (63.6%) comparisons versus 11/22 (50.0%) for LOCF. In agreement with previous studies, use of subscales of the Hamilton Depression Rating scale (HAMD) improved signal detection compared to the HAMD total score. Using MMRM with HAMD subscales correctly identified drug as superior to placebo in up to 17/22 (77.3%) comparisons. Excluding double-blind, placebo lead-in responders did not increase the frequency of correctly identifying drug-versus-placebo differences. Conclusions: The 22 drug-versus-placebo comparisons in this report offer a small amount of evidence and therefore may not be convincing on their own, although results do agree with previous research. Researchers may be able to take advantage of these easy-to-implement methods while we wait for further improvements in other areas.
Article
Full-text available
This article reviews phase 2-3 clinical trial designs, including their genesis and the potential role of such designs in treatment evaluation. The paper begins with a discussion of the many scientific flaws in the conventional phase 2 --> phase 3 treatment evaluation process that motivate phase 2-3 designs. This is followed by descriptions of some particular phase 2-3 designs that have been proposed, including two-stage designs to evaluate one experimental treatment, a design that accommodates both frontline and salvage therapy in oncology, two-stage select-and-test designs that evaluate several experimental treatments, dose-ranging designs, and a seamless phase 2-3 design based on both early response-toxicity outcomes and later event times. A general conclusion is that, in many circumstances, a properly designed phase 2-3 trial utilizes resources much more efficiently and provides much more reliable inferences than conventional methods.
Article
Full-text available
The true dose effect in flexible-dose clinical trials may be obscured and even reversed because dose and outcome are related. To evaluate dose effect in response on primary efficacy scales from 2 randomized, double-blind, flexible-dose trials of patients with bipolar mania who received olanzapine (N = 234, 5-20 mg/day), or patients with schizophrenia who received olanzapine (N = 172, 10-20 mg/day), we used marginal structural models, inverse probability of treatment weighting (MSM, IPTW) methodology. Dose profiles for mean changes from baseline were evaluated using weighted MSM with a repeated measures model. To adjust for selection bias due to non-random dose assignment and dropouts, patient-specific time-dependent weights were determined as products of (i) stable weights based on inverse probability of receiving the sequence of dose assignments that was actually received by a patient up to given time multiplied by (ii) stable weights based on inverse probability of patient remaining on treatment by that time. Results were compared with those by unweighted analyses. While the observed difference in efficacy scores for dose groups for the unweighted analysis strongly favored lower doses, the weighted analyses showed no strong dose effects and, in some cases, reversed the apparent "negative dose effect." While naïve comparison of groups by last or modal dose in a flexible-dose trial may result in severely biased efficacy analyses, the MSM with IPTW estimators approach may be a valuable method of removing these biases and evaluating potential dose effect, which may prove useful for planning confirmatory trials.
Article
Full-text available
There are significant unmet needs in the treatment of schizophrenia, especially for the treatment of cognitive impairment, negative syndrome, and cognitive function. Preclinical data suggest that agonists with selective affinity for acetylcholine muscarinic receptors provide a potentially new mechanism to treat schizophrenia. The authors studied xanomeline, a relatively selective muscarinic type 1 and type 4 (M(1) and M(4)) receptor agonist, to determine if this agent is effective in the treatment of schizophrenia. In this pilot study, the authors examined the efficacy of xanomeline on clinical outcomes in subjects with schizophrenia (N=20) utilizing a double-blind, placebo-controlled, 4-week treatment design. Outcome measures included the Positive and Negative Syndrome Scale (PANSS) for schizophrenia, the Brief Psychiatric Rating Scale (BPRS), the Clinical Global Impression (CGI) scale, and a test battery designed to measure cognitive function in patients with schizophrenia. Subjects treated with xanomeline did significantly better than subjects in the placebo group on total BPRS scores and total PANSS scores. In the cognitive test battery, subjects in the xanomeline group showed improvements most robustly in measures of verbal learning and short-term memory function. These results support further investigation of xanomeline as a novel approach to treating schizophrenia.
Article
Background: The placebo response rate has increased in several psychiatric disorders and is a major issue in the design and interpretation of clinical trials. The current investigation attempted to identify potential predictors of placebo response through examination of the placebo-controlled clinical trial database for escitalopram in 3 anxiety disorders and in major depressive disorder (MDD). Method: Raw data from placebo-controlled studies (conducted from 2002 through the end of 2004) of escitalopram in patients meeting DSM-IV criteria for MDD and anxiety disorders (generalized anxiety disorder [GAD], social anxiety disorder [SAD], panic disorder) were used. Potential predictors examined were type of disorder, location of study, dosing regimen, number of treatment arms, gender of patients, and duration and severity of disorder. Results: Placebo response (defined as the percent decrease from baseline in the reference scale) was higher in GAD and MDD studies conducted in Europe (p < .0001 and p = .0006, respectively) and was not associated with gender or duration of episode. In GAD, the placebo response rate was higher in a European fixed-dose study, which also had more treatment arms. In SAD and in U.S. specialist-treated MDD, a higher placebo response rate was predicted by decreased baseline disorder severity. Conclusion: Additional work is needed before definitive recommendations can be made about whether standard exclusion criteria in clinical trials of antidepressants, such as mild severity of illness, maximize medication-to-placebo differences. This analysis in a range of anxiety disorders and MDD suggests that there may be instances in which the predictors of placebo response rate themselves vary across different conditions.
Article
Analyses of dose response studies should separate the question of the existence of a dose response relationship from questions of functional form and finding the optimal dose. A well-chosen contrast among the estimated effects of the studied doses can make a powerful test for detecting the existence of a dose response relationship. A contrast-based test attains its greatest power when the pattern of the coefficients has the same shape as the true dose response relationship. However, it loses power when the contrast shape and the true dose response shape are not similar. Thus, a primary test based on a single contrast is often risky. Two (or more) appropriately chosen contrasts can assure sufficient power to justify the cost of a multiplicity adjustment. An example shows the success of a two-contrast procedure in detecting dose response, which had frustrated several standard procedures. Copyright (C) 2000 John Wiley & Sons, Ltd.
Article
Dose-response studies are frequently used to study the effects of an experimental compound on various responses. One characteristic of the compound that is frequently of interest is the minimum effective dose (MED). Generally, inference about the MED is made by comparing various dose groups with a control (Dunnett 1955; Williams 1971). This article examines two families of contrasts used for related problems, and proposes a new family of contrasts (called basin contrasts) designed specifically for identifying the MED. Monte Carlo studies are used to demonstrate the superiority of the contrast procedures and make recommendations for their use.
Article
The goal of drug development must be to eliminate from further development those compounds that are not worthy of further investment and to eliminate them as soon as possible. Issues in dealing with early clinical trials in man and questions regarding their necessity are presented. The argument is made that early Phase I and Phase II trials, or at least the reasons for running them, must be standardized rigidly across all drugs in a portfolio and that early Phase II trials must be made definitive in the patient population of interest. An example of such a scheme is given. A list of intangible benefits of a standardized process is also offered.
Article
The present study replicates a previous study in which we found that the less frequently used Montgomery–Åsberg Depression Rating Scale (MADRS) is as sensitive an instrument in detecting antidepressant-placebo differences in antidepressant clinical trials as the more widely used Hamilton Depression (HAM-D) rating scale. The Clinical Global Impressions Rating Scale for Severity (CGI-S) was also similar to the other two scales. A retrospective chart review was performed on the records of 139 depressed adult patients who participated in six randomized, placebo-controlled, double-blind antidepressant clinical trials at the North-west Clinical Research Centre between 1996 and 2003. The effect size (measured as the mean change in rating with antidepressants minus the mean change for placebo divided by the pooled SD of change) was 0.68 with MADRS, 0.54 with CGI-S and 0.57 with HAM-D. A correlation analysis also revealed a significant positive relationships between baseline MADRS and HAM-D and final MADRS and HAM-D for the total sample, placebo group, and antidepressant group (P<0.01). Further research is needed to examine which scale is the most appropriate to use for each particular antidepressant clinical trial.
Article
The use of a placebo run-in phase, in which placebo responders are withdrawn from a study before random assignment to treatment condition, has been criticized as favoring the active treatment in clinical trials. We compared the effect size of randomized, placebo-controlled clinical trials (in the treatment of depression with selective serotonin reuptake inhibitors [SSRIs]) that include a placebo run-in phase with those that do not, using a meta-analytic approach. This study differed from earlier meta-analytic studies in that it considered only SSRIs and included only studies using continuous measures of depression, allowing for a more refined assessment of effect size. An extensive literature search identified 43 datasets published between 1980 and 2000 comparing placebo with SSRI and using a continuous measure of depression (usually the Hamilton Depression Rating Scale). We included only studies of at least 6 weeks' duration focusing on treatment for primary acute major depression in adults 18–65 years of age. Studies focusing on depression in specific medical illnesses were not included. Analysis of efficacy was based on 3,047 subjects treated with an SSRI antidepressant and 3,740 subjects treated with a placebo. There was no statistically significant difference in effect size between the clinical trials that had a placebo run-in phase followed by withdrawal of placebo responders and those trials that did not. Despite the lack of a statistically significant difference between studies of withdrawing early placebo responders and those not using this procedure, this approach is likely to continue to be used widely because it produces large absolute effect sizes. It is recommended that future studies clearly describe these procedures and report the number of subjects dropped from the study for early placebo response and other reasons. Depression and Anxiety 19:10–19, 2004. 2004 Wiley-Liss, Inc.
Article
The construction of a depression rating scale designed to be particularly sensitive to treatment effects is described. Ratings of 54 English and 52 Swedish patients on a 65 item comprehensive psychopathology scale were used to identify the 17 most commonly occurring symptoms in primary depressive illness in the combined sample. Ratings on these 17 items for 64 patients participating in studies of four different antidepressant drugs were used to create a depression scale consisting of the 10 items which showed the largest changes with treatment and the highest correlation to overall change. The inter-rater reliability of the new depression scale was high. Scores on the scale correlated significantly with scores on a standard rating scale for depression, the Hamilton Rating Scale (HRS), indicating its validity as a general severity estimate. Its capacity to differentiate between responders and non-responders to antidepressant treatment was better than the HRS, indicating greater sensitivity to change. The practical and ethical implications in terms of smaller sample sizes in clinical trials are discussed.
Article
This randomized, placebo-controlled, double-blind study was the first to evaluate the antidepressant efficacy, safety, and tolerability of an NR2B subunit-selective N-methyl-D-aspartate receptor antagonist, CP-101,606. Subjects had major depression, according to Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition criteria and a history of treatment refractoriness to least 1 adequate trial of a selective serotonin reuptake inhibitor. The study had 2 treatment periods. In period 1, subjects first received a 6-week open-label trial of paroxetine and a single-blind, intravenous placebo infusion. Period 1 nonresponders (n = 30) then received a randomized double-blind single infusion of CP-101,606 or placebo plus continued treatment with paroxetine for up to an additional 4 weeks (period 2). Depression severity was assessed using the Montgomery-Asberg Depression Rating Scale and 17-item Hamilton Depression Rating Scale. On the prespecified main outcome measure (change from baseline in the Montgomery-Asberg Depression Rating Scale total score at day 5 of period 2), CP-101,606 produced a greater decrease than did placebo (mean difference, 8.6; 80% confidence interval, -12.3 to -4.5) (P < 0.10). Hamilton Depression Rating Scale response rate was 60% for CP-101,606 versus 20% for placebo. Seventy-eight percent of CP-101,606-treated responders maintained response status for at least 1 week after the infusion. CP-101,606 was safe, generally well tolerated, and capable of producing an antidepressant response without also producing a dissociative reaction. Antagonism of the NR2B subtype of the N-methyl-D-aspartate receptor may be a fruitful target for the development of a new antidepressant with more robust effects and a faster onset compared with those currently available and capable of working when existing antidepressants do not.
Article
Substantial and highly variable placebo response rates represent a major obstacle to antidepressant development in major depressive disorder (MDD). However, whether the likelihood of receiving active treatment or placebo, a proxy of the degree of expectation of improvement, may itself influence clinical trial outcome is unclear. The goal of this work was to examine whether the probability of receiving placebo influences clinical trial outcome antidepressant MDD trials. Medline/Pubmed publication databases were searched for randomized, double-blind, placebo-controlled trials of antidepressants for adults with MDD. 146 manuscripts involving 182 clinical trials were pooled (n = 36,385). Pooled response rates for drug and placebo were 53.8% and 37.3%. A meta-regression (random-effects) established that the probability of receiving placebo, year of publication, and baseline severity were independent predictors of the risk ratio of responding to antidepressants versus placebo. Specifically, a greater probability of receiving placebo, greater baseline severity and an earlier year of publication predicted greater antidepressant-placebo "efficacy separation". Fixed versus flexible dose design, trial duration and population age did not influence clinical trial outcome.
Article
The primary focus of this paper is to examine analysis strategies for parallel, randomized dose response studies with particular emphasis on identifying the minimum effective dose. Such studies have become a standard for drug development in the pharmaceutical industry. Particular attention is paid to ANOVA followed by multiple comparison procedures with some additional discussion of the utility or regression models. When there are three or fewer dose groups and a placebo in a study, ANOVA techniques are preferred; with a larger number of dose groups, regression analysis has greater utility and reliability. Analysis of factorial dose response studies is reviewed only slightly as this is an emerging area of interest, and further development is necessary.
Article
A critical aspect of biomedical research is the characterization of the dose response relationship of a compound. This is true in laboratory experiments and clinical trials and pertains to efficacy, safety, and the resulting benefit/risk ratio. Presented here is Part I of this article, which deals with some clinical trial design issues surrounding dose response studies. Some additional comments are made about trials for identifying the minimum effective dose, randomized concentration controlled trials, and the use of one-sided hypotheses in designing such trials. Part II is a separate paper reviewing some analysis strategies for dose response studies.
Article
During the last decade, there has been an increasing use of a placebo run-in period prior to randomization to active treatments, or placebo in randomized controlled trials aimed at establishing acute phase antidepressant drug efficacy in patients with major depression. This procedure is thought to reduce response rates to placebo treatment after randomization, thereby increasing the drug-placebo difference. Metaanalyses of 101 studies reveal that a placebo run-in does not (1) lower the placebo response rate, (2) increase the drug-placebo difference, or (3) affect the drug response rate post-randomization in either inpatients or outpatients for any antidepressant drug group. If there is a post-randomization placebo treatment cell, drug response rates are unchanged or are slightly lower than if there is no placebo treatment cell for outpatients. These results suggest that a pill placebo run-in provides no advantage in acute phase efficacy trials.
Article
In the early stages of traditional drug development, the frequency of dosing (e.g., QD, BID, etc.) is typically determined by the pharmacokinetic properties of a compound. After an appropriate dose frequency is chosen, the magnitude of dose is then evaluated via parallel-group dose-response trials. For some drugs, however, blood levels at any given time may not be accurate predictors of clinical response, or the drug may not be absorbed systemically. In those instances, we propose the use of a factorial dose-response trial that simultaneously evaluates frequency of dosing and magnitude of dose. We consider this approach to selecting an appropriate dosing regimen to be more scientifically founded and more cost-effective, than independent evaluation of dose and frequency through separate clinical trials. Some design considerations and statistical analysis strategies for these factorial trials are presented in this paper.
Article
In clinical studies of antidepressants, the Hamilton Depression Rating Scale (HAMD) total score has been the gold standard instrument for establishing and comparing the efficacy of new treatments. However, the HAMD is a multidimensional measure, which may reduce its ability to detect differences between treatments, in particular, changes in core symptoms of depression. Two meta-analyses were conducted to compare the responsiveness of the HAMD total score with several published unidimensional subscale scores based upon core symptoms of depression. The first compared the above instrument's ability to detect differences between fluoxetine and placebo across eight studies involving over 1600 patients. The second analysis involved four studies and over 1200 patients randomized to tricyclic antidepressants and placebo. In both meta-analyses, the unidimensional core subscales outperformed the HAMD total score at detecting treatment differences. The implications of this on sample sizes and power for clinical studies will be discussed. In fact, studies based on the observed effect sizes from the core subscales would require approximately one-third less patients than studies based on the HAMD total score. Effect sizes from each individual HAMD item will also be presented to help explain the differences in responsiveness between the scales.
Article
Analyses of dose response studies should separate the question of the existence of a dose response relationship from questions of functional form and finding the optimal dose. A well-chosen contrast among the estimated effects of the studied doses can make a powerful test for detecting the existence of a dose response relationship. A contrast-based test attains its greatest power when the pattern of the coefficients has the same shape as the true dose response relationship. However, it loses power when the contrast shape and the true dose response shape are not similar. Thus, a primary test based on a single contrast is often risky. Two (or more) appropriately chosen contrasts can assure sufficient power to justify the cost of a multiplicity adjustment. An example shows the success of a two-contrast procedure in detecting dose response, which had frustrated several standard procedures.
Article
In the study of depression, most randomized clinical trials have design features that attempt to sample from a stable patient population. One commonly used design feature is to require patients to maintain some minimum baseline symptom severity score during a placebo lead-in period. One intent of this design feature is to evaluate the behavior of patients prior to administration of active medication. If, during the lead-in period, patients do not maintain minimum symptom severity, the patients are excluded from the remainder of the study, the theory being that the excluded patients are not part of a stable patient population and hence are not likely to demonstrate efficacy of a truly effective treatment. This presentation investigates the effectiveness of a restrictive entry criterion and proposes an alternative explanation for what is usually defined as placebo response.
Article
In recent years, several authors have argued that placebo-controlled trials are invariably unethical when known effective therapy is available for the condition being studied, regardless of the condition or the consequences of deferring treatment. Some have also disputed the value of placebo-controlled trials in such a setting, asserting that the comparison of new treatment with old treatment is sufficient to establish efficacy and is all that should be of interest. This article considers the ethical concerns about use of placebo controls and describes the limited ability of active-control equivalence (also known as noninferiority) trials to establish efficacy of new therapies in many medical contexts. The authors conclude that placebo-controlled trials are not uniformly unethical when known effective therapies are available; rather, their acceptability is determined by whether the patient will be harmed by deferral of therapy. If patients are not harmed, such trials can ethically be carried out. Furthermore, active-control trials, although valuable, informative, and appropriate in many circumstances, often cannot provide reliable evidence of the effectiveness of a new therapy.
Article
The 1-week single-blind placebo lead-in has long been a standard in double-blind psychopharmacology clinical trials. Although a lead-in period is often necessary (e.g., to receive laboratory results before randomization), some authors have demonstrated that the standard single-blind placebo lead-in's performance was similar to having a lead-in in which placebo was not administered. The single-blind placebo lead-in did not decrease postrandomization placebo response, nor did it increase drug-placebo differences. To eliminate a higher percentage of placebo responders before randomization and to reduce potential biases in baseline ratings, the authors designed and implemented two depression studies with a double-blind variable placebo lead-in period. In these designs, both the patients and personnel at the investigative sites were blinded to the length of the placebo lead-in period and the start of the active treatment period. Approximately 28% of the patients in the double-blind placebo lead-in studies met criteria to be placebo lead-in responders, as compared with fewer than 10% from two single-blind placebo lead-in studies conducted in a similar time frame. Although all patients continued in the study (including placebo lead-in responders), the primary efficacy analysis prospectively excluded double-blind placebo lead-in responders. Analysis of postrandomization changes revealed that double-blind placebo lead-in responders, even when continuing to receive placebo treatment, maintained their response. At the study endpoint, these placebo lead-in responders had significantly lower severity scores than their counterparts who were not lead-in responders. The prospective removal of lead-in responders thus resulted in an increase in mean endpoint placebo group severity scores. This resulted in an increased drug-placebo treatment difference in one of the two studies but had no effect on the treatment difference in the other study.
Article
Intense debate persists about the need for placebo-controlled groups in clinical trials of medications for major depressive disorder (MDD). There is continuing interest in the development of new medications, but because effective antidepressants are already available, ethical concerns have been raised about the need for placebo groups in new trials. To determine whether the characteristics of placebo control groups in antidepressant trials have changed over time. We searched MEDLINE and PsychLit for all controlled trials published in English between January 1981 and December 2000 in which adult outpatients with MDD were randomly assigned to receive medication or placebo. Seventy-five trials met our criteria for inclusion. Data were extracted from the articles by 2 of the authors and discrepancies were resolved via discussion and additional review by a third author. The mean (SD) proportion of patients in the placebo group who responded was 29.7% (8.3%) (range, 12.5%-51.8%). Most studies examined more than a single active medication, and, in the active medication group with the greatest response, the mean (SD) proportion of patients responding was 50.1% (9.0%) (range, 31.6%-70.4%). Both the proportion of patients responding to placebo and the proportion responding to medication were significantly positively correlated with the year of publication (for placebo: n = 75; r = 0.45; 95% confidence interval [CI], 0.25-0.61; P<.001; for medication: n = 75; r = 0.26; 95% CI, 0.03-0.46; P =.02). The association between year of publication and response rate was more statistically robust for placebo than medication. The response to placebo in published trials of antidepressant medication for MDD is highly variable and often substantial and has increased significantly in recent years, as has the response to medication. These observations support the view that the inclusion of a placebo group has major scientific importance in trials of new antidepressant medications and indicate that efforts should continue to minimize the risks of such studies so that they may be conducted in an ethically acceptable manner.
Article
Unidimensional subscales for assessment of major depression may be more sensitive to antidepressant drug effects than the Hamilton Depression Rating Scale (HAM-D). To further examine this possibility, we analyzed pooled data from eight comparable, well-controlled clinical trials of venlafaxine and compared such subscales and the 17-item HAM-D (HAM-D(17)) based on effect size and number of patients required for 80% power. Symptoms of depression were assessed using the HAM-D among intent-to-treat patients (2045) randomly assigned to receive venlafaxine (immediate release, n = 474; extended release, n = 377), one of several selective serotonin reuptake inhibitors (SSRIs) (n = 748), or placebo (n = 446) for up to 8 weeks. With SSRIs or venlafaxine vs. placebo, subscales yielded effect sizes (0.328-0.528) 16 to 76% larger than the HAM-D(17) did (0.237 and 0.396, respectively), and required 31 to 64% fewer patients for 80% power. With venlafaxine vs. SSRIs, the subscales showed no advantage over the HAM-D(17); all devices yielded comparable, positive effect sizes (0.183-0.195). Final subscale scores significantly predicted (all P < 0.05) whether patients met criteria for remission (eg, HAM-D(17) score of < or = 7). These findings suggest that unidimensional subscales are more sensitive to antidepressant drug effects than the HAM-D(17) is, but only in active agent/placebo comparisons. Our data further suggest the subscales can predict the presence of remission. Given these findings, prudent use of these subscales may be appropriate, cost-effective, and informative.
Article
The assumption that the design of an antidepressant clinical trial affects the outcome of that trial is based on sparse data. We sought to examine if the dosing schedule, either a fixed dose or a flexible dose type, in an antidepressant clinical trial affects the frequency with which antidepressants show statistical superiority over placebo. Randomized, placebo-controlled clinical trials of nine antidepressants approved by the Food and Drug Administration between 1985 and 2000 were reviewed. These trials comprised 9313 depressed patients who participated in 51 antidepressant clinical trials consisting of 92 treatment arms with eventual approved doses. In the flexible dose trials, 59.6% (34/57) of the antidepressant treatment arms were statistically significant compared to placebo, whereas in the fixed dose trials only 31.4% (11/35) of the antidepressant treatment arms were statistically significant compared to placebo (chi(2)=6.9, df=1, p<0.01). These data suggest that the antidepressant dose schedule may influence trial outcome due in part to a significantly lower magnitude of symptom reduction with placebo in flexible dose trials (F=4.08, df=1, 48, p&<0.05) compared to fixed dose trials. Symptom reduction was similar with antidepressants in the flexible and fixed dose trials. Further, the primary function of finding a dose-response relationship was not found among the fixed dose studies.
Article
The present study replicates a previous study in which we found that the less frequently used Montgomery-Asberg Depression Rating Scale (MADRS) is as sensitive an instrument in detecting antidepressant-placebo differences in antidepressant clinical trials as the more widely used Hamilton Depression (HAM-D) rating scale. The Clinical Global Impressions Rating Scale for Severity (CGI-S) was also similar to the other two scales. A retrospective chart review was performed on the records of 139 depressed adult patients who participated in six randomized, placebo-controlled, double-blind antidepressant clinical trials at the North-west Clinical Research Centre between 1996 and 2003. The effect size (measured as the mean change in rating with antidepressants minus the mean change for placebo divided by the pooled SD of change) was 0.68 with MADRS, 0.54 with CGI-S and 0.57 with HAM-D. A correlation analysis also revealed a significant positive relationships between baseline MADRS and HAM-D and final MADRS and HAM-D for the total sample, placebo group, and antidepressant group (P<0.01). Further research is needed to examine which scale is the most appropriate to use for each particular antidepressant clinical trial.
Article
The pharmaceutical industry faces considerable challenges, both politically and fiscally. Politically, governments around the world are trying to contain costs and, as health care budgets constitute a very significant part of governmental spending, these costs are the subject of intense scrutiny. In the United States, drug costs are also the subject of intense political discourse. This article deals with the fiscal pressures that face the industry from the perspective of R&D. What impinges on productivity? How can we improve current reduced R&D productivity?
Article
Of all the therapeutic areas, diseases of the CNS provide the biggest challenges to translational research in this era of increased productivity and novel targets. Risk reduction by translational research incorporates the "learn" phase of the "learn and confirm" paradigm proposed over a decade ago. Like traditional drug discovery in vitro and in laboratory animals, it precedes the traditional phase 1-3 studies of drug development. The focus is on ameliorating the current failure rate in phase 2 and the delays resulting from suboptimal choices in four key areas: initial test subjects, dosing, sensitive and early detection of therapeutic effect, and recognition of differences between animal models and human disease. Implementation of new technologies is the key to success in this emerging endeavor.
Article
At effective doses, patients with major depressive disorder (MDD) treated with duloxetine have been found to experience significant symptom improvement as measured by HAMD(17) total score. In addition, duloxetine-treated patients have significantly higher remission and response rates compared with placebo. The objective of this analysis is to determine the optimal dose of duloxetine in MDD. Effect size for duloxetine 40mg, 60mg, 80mg, and 120mg per day were estimated using all 6 acute phase III clinical trials in patients with MDD. The tolerability of duloxetine 40mg, 60mg, 80mg, and 120mg were evaluated using pooled data from the 6 studies. The primary efficacy measure in all trials was the HAMD(17) total score, from which were determined the effect size for HAMD(17) change scores, response rates (50% reduction from baseline to endpoint), and remission rates (HAMD(17) total score < or =7). A total of 1619 randomized patients were included in these studies, of which 632 were treated with placebo; 177 with duloxetine 40mg/day; 251 with 60mg/day; 363 with 80mg/day; and 196 with 120mg/day. An evaluation of increments in effect size between doses consistently showed that the most notable gain in effect size for efficacy was the 40-60mg/day dosage range. All dosages from 60 to 120mg were effective. The tolerability assessment indicated duloxetine at 40-120mg/day is well tolerated. Furthermore, the initial doses of 40-80mg/day were found to have comparable tolerability. The effect size analyses demonstrate that duloxetine 40mg has minimum efficacy, and that duloxetine 60-120mg/day is effective in the treatment of patients with MDD. An initial dose less than 60mg/day might provide better tolerability for some patients diagnosed with MDD.
Article
The optimal dose for achieving the maximum antidepressive effect of selective serotonin reuptake inhibitors (SSRIs) or serotonin-noradrenalin reuptake inhibitors (SNRIs) remains a controversial issue. The varying sensitivity of scales that measure the severity of depression is one of the many factors affecting the evaluation of the dose-response relationship with antidepressants. To determine if the 6-item Hamilton rating scale for depression (HAM-D6) demonstrates a clearer association between dose and antidepressive effect compared with the 17-item Hamilton rating scale for depression (HAM-D17) for fixed doses of duloxetine hydrochloride (40, 60, 80, and 120 mg daily) from six double-blind, randomized, placebo-controlled clinical trials assessing safety and efficacy in the acute treatment of patients with DSM-IV-defined major depressive disorder (MDD). Mantel-Haenszel adjusted effect sizes were determined by dose for change from baseline to endpoint in HAM-D6 and HAM-D17 scores from the six studies. To confirm, assessments were repeated on the subset of the population corresponding to the 70% of patients with the longest duration of treatment regardless of study, treatment, dose, geography, or completion status. For the majority of assessments, HAM-D6 effect sizes were numerically larger than those estimated from the HAM-D17. Findings support that duloxetine 60 mg daily is the best effective dose. In this assessment of patients with MDD, the HAM-D6 was shown to be more sensitive compared with the HAM-D17 at detecting treatment effects. These findings are consistent with published results of other effective antidepressants.
Article
The placebo response rate has increased in several psychiatric disorders and is a major issue in the design and interpretation of clinical trials. The current investigation attempted to identify potential predictors of placebo response through examination of the placebo-controlled clinical trial database for escitalopram in 3 anxiety disorders and in major depressive disorder (MDD). Raw data from placebo-controlled studies (conducted from 2002 through the end of 2004) of escitalopram in patients meeting DSM-IV criteria for MDD and anxiety disorders (generalized anxiety disorder [GAD], social anxiety disorder [SAD], panic disorder) were used. Potential predictors examined were type of disorder, location of study, dosing regimen, number of treatment arms, gender of patients, and duration and severity of disorder. Placebo response (defined as the percent decrease from baseline in the reference scale) was higher in GAD and MDD studies conducted in Europe (p < .0001 and p = .0006, respectively) and was not associated with gender or duration of episode. In GAD, the placebo response rate was higher in a European fixed-dose study, which also had more treatment arms. In SAD and in U.S. specialist-treated MDD, a higher placebo response rate was predicted by decreased baseline disorder severity. Additional work is needed before definitive recommendations can be made about whether standard exclusion criteria in clinical trials of antidepressants, such as mild severity of illness, maximize medication-to-placebo differences. This analysis in a range of anxiety disorders and MDD suggests that there may be instances in which the predictors of placebo response rate themselves vary across different conditions.
Article
Dose-response studies with multiple endpoints can be formulated as closed testing or partition testing problems. When the endpoints are primary and secondary, whether the order in which the doses are to be tested is pre-determined or sample determined lead to different partitioning of the parameter space corresponding to the null hypotheses to be tested. We use the case of two doses and two endpoints to illustrate how to apply the partitioning principle to construct multiple tests that control the appropriate error rate. Graphical representation can be useful in visualizing the decision process.
Article
Because the appropriate design and end points for phase II evaluation of targeted anticancer agents are unclear, we undertook a review of recent reports of phase II trials of targeted agents to determine the types of designs used, the planned end points, the outcomes, and the relationship between trial outcomes and regulatory approval. We retrieved reports of single-agent phase II trials in six solid tumors for 19 targeted drugs. For each, we abstracted data regarding planned design and actual results. Response rates were examined for any relationship to eventual success of the agents, as determined by US Food and Drug Administration approval for at least one indication. Eighty-nine trials were identified. Objective response was the primary or coprimary end point in the majority of trials (61 of 89 trials). Fourteen reports were of randomized studies generally evaluating different doses of agents, not as controlled experiments. Enrichment for target expression was uncommon. Objective responses were seen in 38 trials; in 19 trials, response rates were more than 10%, and in eight, they were more than 20%. Agents with high response rates tended to have high nonprogression rates; renal cell carcinoma was the exception to this. Higher overall response rates were predictive of regulatory approval in the tumor types reviewed (P = .005). In practice, phase II design for targeted agents is similar to that for cytotoxics. Objective response seems to be a useful end point for screening new targeted agents because, in our review, its observation predicted for eventual success. Improvements in design are recommended, as is more frequent inclusion of biological questions as part of phase II trials.
Article
Data on percentage of patients experiencing a relevant response (>50% reduction of the baseline Hamilton Depression Scale (HAMD) score), average baseline severity and sample size were retrieved for all placebo-controlled studies in regulatory submissions of SSRIs and SNRIs between 1984 and 2003. Overall there was 16%-units (95% CI: 12; 20) more responders on active drug compared to placebo. There was no evidence of a diminishing magnitude of effect with lower severity at baseline. With one exception significant differences varying between 13.5 and 19.3%-units were demonstrated for the individual antidepressants. Statistically significant mean differences versus placebo in change in HAMD are not a proper basis for evaluation of clinical relevance and are not sufficient for approval. Differences in the percentage of patients experiencing a clinically relevant response should also be demonstrated. In this respect, the approved SSRIs and SNRIs were found superior to placebo, independent of severity of depression.
Placebo response in studies of major depression: variable, substantial, and growing
  • B T Walsh
  • S N Seidman
  • R Sysko
  • M Gould
Walsh BT. Seidman SN, Sysko R, Gould M. Placebo response in studies of major depression: variable, substantial, and growing. )AMA. 2002;287: