Rationale Psychedelic research continues to garner significant public and scientific interest with a growing number of clinical studies examining a wide range of conditions and disorders. However, expectancy effects and effective condition masking have been raised as critical limitations to the interpretability of the research. Objective In this article, we review the many methodological challenges of conducting psychedelic clinical trials and provide recommendations for improving the rigor of future research. Results Although some challenges are shared with psychotherapy and pharmacology trials more broadly, psychedelic clinical trials have to contend with several unique sources of potential bias. The subjective effects of a high-dose psychedelic are often so pronounced that it is difficult to mask participants to their treatment condition; the significant hype from positive media coverage on the clinical potential of psychedelics influences participants’ expectations for treatment benefit; and participant unmasking and treatment expectations can interact in such a way that makes psychedelic therapy highly susceptible to large placebo and nocebo effects. Specific recommendations to increase the success of masking procedures and reduce the influence of participant expectancies are discussed in the context of study development, participant recruitment and selection, incomplete disclosure of the study design, choice of active placebo condition, as well as the measurement of participant expectations and masking efficacy. Conclusion Incorporating the recommended design elements is intended to reduce the risk of bias in psychedelic clinical trials and thereby increases the ability to discern treatment-specific effects of psychedelic therapy.
Great Expectations: recommendations forimproving
themethodological rigor ofpsychedelic clinical trials
Jacob S. Aday · Boris D. Heifets · Steven D. Pratscher · Ellen Bradley · Raymond Rosen · Joshua D. Woolley
Received: 18 October 2021 / Accepted: 14 March 2022
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2022
Recent high-profile clinical trials with psychedelic drugs
have highlighted challenges related to rigorous study design
and condition masking that have simmered in both psy-
chotherapy and pharmacology research for decades (e.g.,
This article belongs to a Special Issue on Psychopharmacology on
Psychedelic Drugs
Basoglu etal. 1997; Enck and Zipfel 2019). Interrelated
methodological challenges regarding the selection of appro-
priate control conditions, masking (also known as blind-
ing1), and expectancy effects have clouded our understand-
ing of the source of clinical improvements in psychedelic
studies and, in fact, across medicine. Studies on psychedelic
therapy are particularly challenging as they must address
methodological issues inherent to both psychotherapy and
pharmacology research as well as issues that are distinctly
problematic to the field, such as “hype” and salient psycho-
active effects that compromise masking. In this paper, we
delineate how many of the methodological limitations that
have been raised as critiques of psychedelic science are com-
mon challenges across psychotherapy and pharmacology
research more broadly and are in need of addressing. This
review allows us to share lessons across disciplines and pro-
vide recommendations for improving future psychedelic and
non-psychedelic research. We conclude by highlighting that
psychedelic studies should not be held to a different stand-
ard than other forms of psychotherapy or pharmacology
research, and that the fields can leverage important lessons
from one another by recognizing their shared limitations. To
this end, we provide practical methodological recommen-
dations to measure and manage expectations as well as to
enhance masking in psychedelic studies. These recommen-
dations can be deployed more broadly across clinical trials
to improve the rigor and reproducibility of future research.
Treatment‑nonspecic eects
To begin, we review the various reasons for including control
conditions in clinical studies and examine what exactly is
being controlled. In any clinical trial, changes in symptoms
can be observed because of treatment-specific or treatment-
nonspecific effects (Turner etal. 1994). Treatment-specific
effects are changes directly attributable to the independent
variable or intervention under study (e.g., drug dose or psy-
chotherapeutic approach). Treatment-nonspecific effects are
changes not related to the specific treatment arm (i.e., com-
mon to being in any clinical trial), as well as placebo and
nocebo effects related to treatment expectations (Table1).
Including certain control conditions allows the trialist to fil-
ter out contributions of treatment-nonspecific effects from
treatment-specific effects to attribute clinical improvements
to the intervention under study (Fig.1a).
The natural history or spontaneous variation of any given
disease under study may be the least controllable source of
treatment-nonspecific change that can confound clinical
trial interpretation. Symptoms can change (e.g., spontane-
ous remission) independently of the study intervention as a
function of an unidentified biological or psychosocial change
in the individual’s life. Additionally, in most clinical tri-
als, participants are screened and selected based on mini-
mum criteria of symptom severity, and many individuals
may be especially motivated to seek out research studies
when their symptoms peak in severity (Whitney and Von
Korff 1992). Subsequent measurements using the same scale
may show an apparent improvement. This “regression to the
mean” rather than a true treatment-specific effect may lead
researchers to erroneously conclude a treatment is effective
when participants may have improved over time without
Table 1 Key terms and definitions
Key terms Definition
Confounding variable A factor other than variables under study that influences the dependent variable
Hawthorne effect Changing one’s behavior as response to the interest, care, or attention received through observation and assessment
Regression to the mean Tendency for extreme scores to return to average over time
Spontaneous remission Symptom resolution independent of the study manipulation as a function of an unidentified biological or psychosocial
Process expectancies Expectations regarding what will happen during the treatment
Outcome expectancies Expectations regarding the outcome of the treatment
Active placebo A control condition that closely resembles the presentation and side effects of the experimental treatment without
providing the therapeutic effects
Masking efficacy Degree to which participants are unaware of their treatment arm assignment
Placebo effect Treatment-nonspecific improvement in symptoms attributable to contextual factors, such as participants’ positive
expectations regarding the treatment
Nocebo effect Treatment-nonspecific worsening of symptoms as a result of contextual factors, such as negative expectations regarding
the treatment
Michael Pollan Effect Heightened positive expectations regarding the efficacy of psychedelics in recent years stemming from Michael Pollan’s
book, How to Change Your Mind
1 In recent years, the term “masking” has been used in place of
“blinding”; here, we have opted to use the term “masking” but con-
sider the terms synonymous
any treatment (Hengartner 2020). Regression to the mean is
a ubiquitous statistical phenomenon that results whenever
cases are selected for follow-up based on abnormally high or
low scores at baseline, demonstrated in observational stud-
ies and clinical trials, and across multiple diseases (Bland
and Altman 1994). Changes due to the natural course of the
condition and regression to the mean are considered theo-
retically distinct but in practice are difficult to disentangle.
Participant behavior can also change simply as a conse-
quence of the interest, care, or attention received as part of
a study. This well-established psychological phenomenon is
known as the Hawthorne effect (Sedgwick and Greenwood
2015). This effect is associated with outcomes as diverse as
workplace productivity to cognitive functioning and quality
of life in dementia patients (McCarney etal. 2007). Nota-
bly, researchers and study personnel, not just participants,
can be susceptible to Hawthorne effects, thereby influencing
clinical outcomes (Sedgwick and Greenwood 2015). That
is, those caring for participants in an experimental trial are
under increased scrutiny and observation as compared to
those operating in an unobserved clinical setting, and this
difference may impact both the quality and quantity of
patient care. This bias can cause an overestimation of an
experimental treatment’s therapeutic effect due to clinical
improvements from treatment-nonspecific factors. A distinct
but related issue is that the simple act of repeated obser-
vation and measurement of behaviors and symptoms can
alter those same behaviors and symptoms. Repeated pain
assessments can increase pain chronicity (Ferrari and Rus-
sell 2010), asking about illicit drug use can decrease use
(D’Onofrio etal. 2012), and daily symptom assessments
can worsen or improve symptom severity in PTSD (Dewey
etal. 2015; Pedersen etal. 2014). Drawing extra attention
to an issue can lead to symptom amplification or may pro-
vide more opportunities to resolve it (Barsky and Klerman
1983). In either case, it is clear that simply enrolling in a
clinical trial can influence symptoms regardless of treatment
Fig. 1 Treatment-nonspecific effects in clinical trials. (a) Hypotheti-
cal results of a clinical trial to delineate the sources of treatment-
specific and treatment-nonspecific effects. Including placebo and no
treatment control conditions allows trialists to identify treatment-
specific effects (figure inspired by Wampold et al. 2016). (b) In a
clear illustration of expectancy effects, Bingel etal. (2011) measured
participants’ pain intensities before (i.e., Baseline) and after receiv-
ing remifentanil while manipulating participant expectancies across
three groups (e.g., No expectancy, Positive expectancy, or Negative
expectancy). They found that priming positive treatment expectancy
doubled the analgesic effect of remifentanil when compared to no
expectancy. In contrast, inducing negative treatment expectancies
eliminated the analgesic effect. (c) Gold et al. (2017) demonstrated
that treatment effect sizes vary as a function of the type control group
Taken together, issues related to the natural course
of the disease, regression to the mean, and observation-
related changes highlight that there are many mecha-
nisms by which symptoms may change in a clinical trial
irrespective of the treatment being tested. It is therefore
important to include, at a minimum, control arms that do
not receive the treatment, as treatment-nonspecific fac-
tors confound experimental and control arms to a simi-
lar extent. However, the simple inclusion of an untreated
comparison group may not be enough to isolate treat-
ment-specific effects (Gold etal. 2017; Enck and Zipfel
2019). Participants often have expectations regarding the
efficacy of the treatment under study. If participants have
knowledge about their treatment arm assignment (e.g.,
in an open-label study), or gain knowledge through their
subjective experience (e.g., having a psychedelic trip) or
somatic symptoms, their expectations about therapeutic
efficacy can affect their clinical outcomes. This problem
is common to most psychotropic trials (e.g., selective
serotonin reuptake inhibitors [SSRIs]; Hieronymus etal.
2018) and is particularly salient for high-dose psyche-
delic trials in which subjective drug effects are especially
pronounced. Without effective condition masking, it is
virtually impossible to maintain the independence of the
main variable under study (i.e., the treatment), as it is
confounded by participant expectations. In addition to
influencing participant outcomes, baseline expectancies
about a treatment’s therapeutic effects can also impact
masking efficacy (i.e., whether participants are aware of
their treatment arm assignment), as those with notice-
able improvements in symptoms often assume they were
assigned to the active treatment group (Sackett 2007). We
now consider several specific expectations and how they
interact with masking and treatment outcomes.
Expectancies inpsychotherapy
andpharmacology research
Tambling (2012) differentiates between expectations about
the process of treatment and expectations about the out-
come of treatment. In the case of psychotherapy, process
expectations are expectations about what will happen dur-
ing therapy (e.g., patient’s thoughts about roles they and
their therapist will assume, characteristics of their thera-
pist, and what sessions will entail). In pharmacological
trials, process expectations can include expectations about
any acute drug effects, including psychoactive effects.
Process expectancies may be particularly pertinent with
psychedelic drug trials as expectations about the acute
effects of the drugs are shaped by hours of psychother-
apy, widespread representations in popular media, and a
highly ritualized process of drug administration. When
these expectations are matched by experience, a study par-
ticipant may be especially confident in unmasking their
treatment arm assignment.
Outcome expectations refer to whether the treatment
is anticipated to reduce symptoms. In the case of psycho-
therapy, studies suggest that outcome expectancies are
stronger predictors of therapeutic effects than are specific
psychotherapy techniques (Horvath etal. 2011; Webb etal.
2010). Positive outcome expectations are related to stronger
alliance with the therapist, which is associated with better
outcomes (Vîslă etal. 2018; Yoo etal. 2014). A recent,
well-powered meta-analysis (N = 12,722) compared patient
outcome expectancies and clinical outcomes across a vari-
ety of diagnoses and psychotherapy interventions, revealing
that greater positive outcome expectancy was consistently
associated with better treatment results (Cohen’s d = 0.36;
Constantino etal. 2018). Outcome expectancies also have
strong effects relative to the active effects of psychotropic
drugs (Rutherford and Roose 2013). In trials where patient-
reported outcomes are the primary efficacy measures, the
effects of outcome expectancies are particularly strong
(Atlas 2021). Fillingim and Price (2005) concluded that in
placebo analgesia studies outcome expectancies accounted
for up to 81% of variance in post-treatment pain ratings.
Thus, across clinical research contexts, participants’ out-
come expectations about the specific treatment being admin-
istered influence clinical outcomes.
Negative outcome expectations can also influence
clinical outcomes. When individuals are aware that they
have been assigned to a treatment that they believe is
unlikely to improve their symptoms, negative expectation
alone can worsen patient outcomes, which is known as
the nocebo effect (Gold etal. 2017; Planès etal. 2016).
This effect was elegantly demonstrated in a study with
remifentanil, an opioid analgesic, which found that prim-
ing negative expectations about the treatment completely
negated the analgesic effect of the drug (Bingel etal.
2011; Fig.1b). Furthermore, if a participant has positive
expectations about the proposed experimental treatment
but comes to believe they have been assigned to a control
condition, outcomes may worsen as a result of disappoint-
ment or the belief that one will not improve without being
assigned to the active treatment (Furukawa etal. 2014).
Indeed, those put on a waitlist control condition typi-
cally have worse outcomes than those assigned to active
placebo, or even no treatment, as they have less reason
to expect an improvement in symptoms (Patterson etal.
2016). With waitlist control designs, those in the control
condition do not receive treatment until after a waiting
period, where they are compared with the active treatment
group. However, participants are generally aware that they
are in a control condition during their waiting period and
thus may not expect to see improvements, whereas the
active treatment group likely has the opposite expecta-
tion. Therefore, waitlist control designs may artificially
inflate intervention effect size estimates (Fig.1c; Cun-
ningham etal. 2013; Zhipei etal. 2014). Possibly illus-
trating this effect, in a waitlist control study of psilocybin
for the treatment of major depressive disorder, waitlisted
participants reported higher anxiety scores at the end of
the waitlist period compared to the beginning, enhancing
the apparent therapeutic effect of psilocybin (Davis etal.
2021). The crucial role of expectancies in treatment out-
comes across clinical contexts underscores the need for
trial designs that control for expectation-related improve-
ments, which we elaborate on in the following sections.
Importantly, outcome expectancies are rarely measured
in psychotherapy and pharmacology studies (Doering etal.
2014). Constantino etal. (2011) noted that expectancies have
often been thought of as nuisances to clinical research and
disregarded rather than being considered important ingredi-
ents of the therapeutic process. Furthermore, the few studies
that have included assessments of treatment expectations
have used brief and study-specific measures, meaning there
is surprisingly little overlap between studies in how expec-
tations are quantified (Tambling 2012). Moreover, there
is no manual or expert consensus for managing expectan-
cies despite the extensive evidence of the important role
of expectancy in treatment responses (Zilcha-Mano etal.
2019). Collectively, these findings highlight that challenges
related to participant expectations are common across psy-
chotherapy and pharmacology research, and that, to date,
there is no standard for addressing expectation-related
Psychedelic research andexpectations
Briefly, the typical structure of a modern psychedelic ther-
apy clinical trial involves an arduous screening process, mul-
tiple preparation sessions, single or multiple drug dosing
sessions, and integration sessions after drug administration
(Fig.2). The preparation sessions are used for several pur-
poses, including to build rapport between the participant and
the therapists or facilitators2, to inform the participant about
common or possible psychedelic drug experiences, to reas-
sure the participant’s safety with dosing day procedures, and
to assist with establishing the patient’s intention(s) for their
dosing session. The drug dosing session is highly structured
with two therapists accompanying the participant throughout
the 6–8-h session in a comfortable environment. During the
dosing session, participants often remain reclined on a couch
with eyeshades and headphones for music and are encour-
aged to focus on their inner experience throughout the drug
session, exploring any content that arises with an open and
accepting mindset. In the days following drug dosing, the
participants work with the same clinical team in integration
Fig. 2 Stages of psychedelic
therapy. Psychedelic therapy
typically involves preparation,
dosing, and integration sessions
2 Notably, there is significant debate about the proper terminology
for the people who provide the preparation and integration and who
monitor participants during the dosing session. “Guide,” “sitter,”
“facilitator,” “therapist,” “monitor,” and other terms have been pro-
posed and have their advocates and detractors. The intensity of these
debates highlights the truth of the old joke that “Scientists would
rather use each other’s toothbrushes than use each other’s terminol-
ogy.” We use the term “facilitator” throughout this manuscript with-
out taking a strong stance on which term is the most correct
sessions to make meaning of their experiences and to incor-
porate any insights they may have had into their lives going
forward. With these fundamental elements of psychedelic
therapy, it is best considered a complex, multicomponent
intervention that includes aspects of both pharmacology and
psychotherapy. Notably, throughout the course of a psyche-
delic therapy trial, a participant’s process expectations and
outcome expectations are subject to change as they gather
more information about possible drug effects, approach the
sessions in a certain way (e.g., trust, let go, be open), and
experience the actual drug effects. Hereafter, we refer to this
package of procedures as psychedelic therapy and acknowl-
edge that all of these aspects may determine treatment-spe-
cific effects.
Participants’ expectations as well as intentions (i.e., what
they desire from the psychedelic experience) are thought
to play a prominent role in the drugs’ acute and long-term
effects (Olson etal. 2020). Some have even termed psych-
edelics “placebo enhancers,” as they can enhance the percep-
tion of meaningfulness (Hartogsohn 2016, 2018) and induce
a state of suggestibility (Carhart-Harris etal. 2015). It has
been noted across popular culture that psychedelic expe-
riences are heavily influenced by one’s expectations, and
some have gone as far as to claim “no other class of drugs
are more suggestible in their effects” (Pollan 2018). Hartog-
sohn (2021) noted that the fundamental role of expectations
in psychedelic drug effects may reconcile the paradoxical
conceptions that have been held about the drugs—views
that are so varied, it at times sounds as though scientists are
discussing completely different drugs (e.g., they have been
used to both treat mental illness and to model psychosis).
Utilizing pre-dosing expectations as well as the acute state
of suggestibility induced by psychedelics in tandem may
be an important component of the therapeutic process with
psychedelic therapy, but this combination can also be co-
opted for nefarious purposes. Historically, psychedelics have
been used by cults as well as investigated for their alleged
potential in “mind control” by the US government during
MK Ultra (Cusack 2020; Kogo 2002; Ledford 2019). There
is even concern about psychedelics’ potential for changing
beliefs (e.g., political or metaphysical; de Wit etal. 2021;
Pace and Devenot 2021; Timmermann etal. 2021) and
memories, though that is beyond the scope of this review.
Therefore, it may be ethical to include an enhanced informed
consent process about possible belief changes induced by
psychedelic therapy prior to enrolling participants into a
clinical trial (Smith and Sisti 2021).
Although pre-dosing expectations have long been thought
to be integral to the effects of psychedelics (Eisner 1997;
Leary etal. 1963), very few studies have actually measured
them. A recent “microdosing” (i.e., sub-hallucinogenic
dosing) study found that positive expectations regarding
psychedelics at baseline predicted subsequent increases in
wellbeing irrespective of whether a participant received a
psychedelic or an inert placebo (Kaertner etal. 2021). Simi-
larly, a large-scale, placebo-controlled study of microdosing
found that participants experienced comparable improve-
ments in mood and cognition in the drug and placebo condi-
tions (Szigeti etal. 2021). Another microdosing study found
that after controlling for baseline expectancies, there was
no difference between psilocybin and placebo on measures
of awe (van Elk etal. 2021). However, to the best of our
knowledge, only a single “macrodosing” (i.e., full halluci-
nogenic dosing) trial has recorded pre-treatment expectan-
cies. An open-label ayahuasca study found that participants
endorsing an expectancy of favorable change in neuroti-
cism, extraversion, and conscientiousness in response to
ayahuasca showed a greater decrease in neuroticism and
greater increases in extraversion and conscientiousness fol-
lowing ayahuasca administration compared to participants
with lower expectancies receiving the same treatment (Weiss
etal. 2021). A recent systematic review found those with
a recreational intention with psychedelics tended to have
less challenging experiences when they used a psychedelic
(Aday etal. 2021; Haijen etal. 2018), again suggesting that
what one desires and expects to experience with psychedelic
influences the drug’s effects. Thus, the few studies that have
measured expectations and intentions to date support the
prevalent assumption that pre-dosing expectations interact
with psychedelic drug effects and outcomes. Whether these
same considerations apply to other drug classes (e.g., such
as psychostimulants) is unknown, further emphasizing the
need to measure and report therapeutic expectations in a
systematic way across areas of clinical research.
High-dose psychedelic trials may also be particularly sus-
ceptible to a type of bias termed “hype” or the “Michael Pol-
lan effect” (Carpenter 2020; Table1). Some have argued that
psychedelic therapy marks the most important innovation in
psychiatry since the introduction of SSRIs, or possibly ever,
and it is not uncommon to hear claims about the potential
for psychedelics to “change the world” from industry leaders
and enthusiasts (Dupuis 2021). This pervasive messaging
may lead to amplified positive expectations compared with
many other types of clinical interventions and perhaps moti-
vates participants to “not let the movement down” by failing
to clinically improve. This notion was illustrated in a recent
ayahuasca study (Aday 2021), where one of the participants
asked us (JSA) if they should stop participating in the study
because they did not have a mystical experience and did not
want to “ruin the research.” In our experience recruiting for
psychedelic studies, many potential participants explicitly
express a sense of pride and excitement in participating in a
psychedelic trial as well as strong confidence in the benefit
of psychedelics to their mental wellbeing. These motivations
for participation and heightened positive expectations cou-
pled with the functional unmasking that often occurs make
1 3
identification of a treatment-specific effect in high-dose psy-
chedelic trials particularly challenging and highlights the
need for study designs that properly mask participants to
conditions (Burke and Blumberger 2021).
Certain aspects of the study personnel, environmental
context, and measures included in psychedelic drug trials
may contribute to enhanced expectations as well. For exam-
ple, the use of two therapists at a time and rituals like placing
a fresh rose in the room on dosing day may serve to amplify
positive expectations and signal that the experience is of par-
ticular significance (Gukasyan and Nayak 2021). Addition-
ally, outcome expectancies of psychotherapists have been
shown to have a marked effect on treatment engagement and
clinical outcomes across therapeutic approaches (Doering
etal. 2014; Leake and King 1977), suggesting this may be a
treatment-nonspecific factor relevant to psychedelic studies
as well. Lastly, the specific measures used in psychedelic
trials can influence participant expectations; one study vol-
unteer noted “I long to see some of the stuff hinted at in the
questionnaire” in reference to questions they encountered
on the Mystical Experience Questionnaire (MEQ; MacLean
etal. 2012; Pollan 2018). Thus, in addition to preexisting
attitudes about psychedelics, certain expectations may be
engendered by characteristics of the trial.
Modern era clinical research design
Next, we will describe many of the study designs and meth-
ods that have been attempted to manage these issues across
psychotherapy and pharmacology trials to date. Open-label
study designs, in which both the patient and study person-
nel are aware of what specific treatment is administered,
most closely resemble how psychotherapy and psychotropic
drugs are administered in real-world, non-research settings.
Although high in ecological validity, this type of design does
not control for most of the confounding nonspecific factors
that can affect clinical outcomes (e.g., Hawthorne effect,
spontaneous variation of symptoms, regression to the mean).
Some treatment-nonspecific factors, such as regression to
the mean, can be controlled if sufficient data are available at
both the individual and group level, as a precise mathemati-
cal formula can be developed to predict the actual regression
effect in a given experimental setting (Barnett etal. 2005).
These authors have identified specific experimental strate-
gies to mitigate or manage expected regression to the mean
effects in a clinical trial. First, they recommend selecting
cases based on multiple baseline observations. Requiring
that eligible subjects have stable test scores over two or
more baseline assessments will predictably reduce, although
not necessarily eliminate, regression to the mean. Second,
the authors suggest correcting for regression to the mean
effects in the analyses by using either ANCOVA modeling
or application of a correction formula. Of note, neither of
these strategies have been systematically applied in studies
of psychedelic therapy. Third, investigators may consider a
waitlist control condition, although we refer the reader to
limitations to this approach noted previously.
The double-blind randomized controlled trial (RCT) is
considered the gold standard design for identifying a true
treatment-specific effect, under conditions where neither
investigator nor participant knows their treatment allocation.
An RCT entails randomly assigning participants to treatment
or control conditions and withholding knowledge of treat-
ment arm assignment from participants and study personnel
(i.e., masking). Effectively executing this design controls for
expectancies as it is unknown which treatment each partici-
pant received, and therefore treatment-nonspecific factors
can be ruled out as the source of treatment arm outcome dif-
ferences. Treatment arm masking in RCTs is best achieved
with active placebo comparators, in which the control con-
dition is structurally equivalent and closely resembles the
presentation and side effects of the experimental treatment
without providing the therapeutic effects (Doering etal.
2014). Inert but identical-looking pills that lack the side
effects of the treatment condition (i.e., inactive placebos)
are often used but may be easy for participants to detect, and
subsequent nocebo effects may confound analyses.
There has been considerable debate that continues today
about what constitutes a proper “inert” placebo for psycho-
therapy in the same sense as an “inert” placebo in pharma-
cology, as some have argued that “there is no such thing as
inert psychotherapy” (Rosenthal and Frank 1956; Wampold
etal. 2016). In the context of psychedelic trials, to date, the
psychotherapy component has been held constant across the
treatment and control conditions, making this issue less rel-
evant for the field for now. However, as researchers delineate
the nuances of what specific forms of psychotherapy are
most synergistic with psychedelics, this potential confound
will become an increasingly important issue to address
(Horton etal. 2021). A related challenge with psychedelic
studies is that unmasking may lead to differences in how
the psychotherapy component is administered and received,
given that the context of the therapy shifts once the partici-
pant and/or therapist becomes aware of the treatment arm
assignment. Therefore, improved masking procedures must
be implemented into psychedelic science for the field to meet
the assumptions of the current gold standard clinical trial
Crossover RCT designs have been used in many pharma-
cological studies as an efficient way to account for treatment-
nonspecific confounds because participants act as their own
control. In a crossover design, participants are randomly
assigned to a sequence of treatments where they receive
both the experimental and placebo treatments but at different
timepoints (i.e., placebo then experimental treatment or vice
versa). A major weakness of crossover designs, however,
is the potential for carryover effects (i.e., the therapeutic
benefits could “carryover” after the first treatment and mis-
represent the true effect of the second treatment). Carryo-
ver effects are especially concerning in psychedelic trials
because the effects of psychedelic therapy in some cases
have been shown to be durable for over a year (Griffiths etal.
2008; Johnson etal. 2017; see Aday etal. 2020b for review).
Thus, even a 12-month washout period is unlikely to achieve
a return to pre-treatment levels on the variable of interest,
which biases within-person analyses and threatens the valid-
ity of conclusions that can be drawn. Moreover, masking is
likely to be compromised in crossover designs that involve a
psychoactive drug (Wilsey etal. 2016). For example, almost
all participants accurately identified their treatment condi-
tion in a crossover study that used psilocybin and niacin as
a placebo control (Grob etal. 2011). Thus, simple crossover
designs may be more confounded than a parallel (between-
subjects) RCT design for psychedelic trials.
We have repeatedly noted the importance of adequate
masking in double-blind RCTs, and emphasize that it is
impossible to know if the double-blind or masking was
achieved without testing masking efficacy. Surprisingly,
however, masking efficacy typically goes unmeasured or
unreported in psychotherapy and pharmacology trials (Doer-
ing etal. 2014). Many researchers report their studies as
being “double-blind” without testing such claims (Basoglu
etal. 1997). A systematic review on methods of masking
in randomized controlled trials with pharmacologic treat-
ments concluded that reporting of condition masking is
generally “quite poor,” and based on trials that have tested
the success of masked methods, a high proportion of stud-
ies are effectively unmasked (Boutron etal. 2006; Rabkin
etal. 1986). This corroborates a recent systematic review
of studies published in top psychiatry journals in 2017 and
2018, which found that only 59% of the trial reports included
adequate reporting of masking outcomes (Juul etal. 2020),
as well as a meta-analysis that indicated a large majority of
antidepressant RCTs do not assess masking efficacy, and
when measured, masking often fails (Scott etal. 2022).
Similarly, a comprehensive literature search found that
masking was not maintained in 20/23 “double-blind” stud-
ies examining psychotropic drugs (Fisher and Greenberg
1993). The authors noted improvements in patient sympto-
mology and side effects from the active drug were the major
cause of unmasking. Long-term masking can be difficult,
if not impossible, to achieve with highly efficacious treat-
ments because it is clear to the patient that they experienced
an improvement in symptoms (Muthukumaraswamy etal.
2021). Thus, many argue that end-of-trial assessments for
masking cannot be done with validity, as they cannot disen-
tangle masking from guesses based on efficacy (Mataix-Cols
and Andersson 2021; Sackett 2007), although it should be
noted that some researchers argue that it is not considered
unmasking at the end of the trial if people guess their condi-
tion based on efficacy (Katz 2021).
Masking attempts inpsychedelic studies
Multiple approaches have been attempted to address these
methodological challenges specifically as they relate to psy-
chedelic trials. First, active placebos have been used in an
attempt to mask participants and therapists to treatment con-
ditions, albeit generally unsuccessfully. This difficulty was
infamously demonstrated in the “Good Friday Experiment,”
where divinity school students were assigned to receive psil-
ocybin or niacin, a B vitamin with mild physiological effects,
in a group setting at a chapel (Pahnke 1963). Despite some
initial confusion because of niacin’s fast-acting effects on
vasodilation and general relaxation, before long, it became
clear which participants had been assigned to which condi-
tion, as those in the psilocybin group had intense subjective
reactions and often spiritual experiences, whereas the niacin
group “twiddled their thumbs” while watching on (Prideaux
2021). By the end of the day, all participants correctly ascer-
tained whether they were in the treatment or control group
(Doblin 1991). Despite the clear masking failure, after more
than 50 years, many researchers today still use niacin as the
active placebo in clinical trials with psychedelics, perhaps
for a lack of better alternatives (Grob etal. 2011; Ross etal.
2016; Siegel etal. 2021). Nevertheless, participants are now
dosed individually rather than in a group to reduce potential
unmasking from witnessing others’ experiences. Modern
psilocybin trials have also employed methylphenidate (Grif-
fiths etal. 2006) and dextromethorphan (DXM; Carbonaro
etal. 2018) as active placebos, although the success of
masking was typically less than 25% or unreported in these
studies (Bershad etal. 2019; Carbonaro etal. 2018; Grif-
fiths etal. 2006). Uthaug etal. (2021) tested an innovative
strategy at masking by mimicking the aesthetic and somatic
features of the psychedelic brew, ayahuasca. The investiga-
tors used a mixture of coco powder, vitamins (unspecified),
turmeric powder, quinoa, traces of coffee, and potato flour,
as a placebo to mimic the texture as well as gastrointestinal
side effects of the drug. Despite effectively masking the pro-
found effects of ayahuasca in several experienced users, a
majority of participants were still able to accurately identify
their treatment assignment (Uthaug etal. 2021). A review of
ongoing clinical trials revealed that researchers are currently
experimenting with a number of other potential control con-
ditions in psychedelic studies, including mannitol, lactose,
ketamine, microcrystalline cellulose, and nicotinamide
(Siegel etal. 2021), but the effectiveness of these attempts
remains to be seen.
Low doses of psychedelics have also been tried as a
potential control condition to improve participant masking
(Griffiths etal. 2016). One study combined a low dose of
psilocybin with incomplete disclosure (see below) such that
participants and study staff were unaware of the number of
treatment arms in the study. Specifically, participants were
informed that they could receive anywhere from 0.5 to 30
mg of psilocybin in the trial when in fact they could only
receive 0.5 mg if they were in the control condition or 25 mg
if they were in the treatment condition (Griffiths etal. 2016).
An advantage of including the low dose of psilocybin is that
all participants are truthfully told they will receive psilocy-
bin, which presumably helps balance treatment expectations
across both conditions. However, participants and therapists
are still at risk for unmasking with this design because it is
typically easy to ascertain whether the participant has an
intense psychedelic experience or not. Schenberg (2021)
also noted that this design may be limited by ethical consid-
erations, given that 3,4-methylenedioxymethamphetamine
(MDMA) research has shown that low-dose control condi-
tions can be stressful and trying for patients, leading to drop-
outs and dissatisfaction (Oehen etal. 2013), and anecdotal
lore in the underground psychedelic therapy community sug-
gests that medium doses of psychedelics can agitate people
without allowing them to “breakthrough” (JDW, personal
communication, 2021). On the other hand, low doses of clas-
sic psychedelics (i.e., microdosing) have been purported to
be therapeutic (Fadiman 2011; Kuypers etal. 2019), which
could also confound study results, although the therapeutic
benefit of single microdoses seems unlikely to be durable or
significant. Thus, including a low-dose psychedelic as part
of an active control condition is a promising starting point.
Incomplete disclosure of certain aspects of the study
design is a strategy that has been employed to enhance
masking success and balance treatment expectations among
conditions. For example, some studies incompletely disclose
the number of treatment arms to participants in an attempt to
obscure the study design and reduce the participants’ con-
fidence in their treatment group allocation (Bershad etal.
2019; Carbonaro etal. 2018; Griffiths etal. 2006; Reissig
etal. 2012). Another compelling approach (in healthy sub-
jects) involves consenting participants to possibly receiving
one of several substances in order to reduce their certainty
of treatment allocation. For example, in some experiments,
participants consent to receive MDMA, methamphetamine,
tetrahydrocannabinol (THC), benzodiazepine, and/or pla-
cebo (Bedi etal. 2010; Bershad etal. 2019), but in fact only
receive one or two of these drugs in any particular study.
Although this design is possible to implement in psyche-
delic studies of healthy individuals who are not seeking
treatment, there are limitations to this approach, includ-
ing reduced generalizability because a large proportion of
the population may not be comfortable with receiving any
one of the listed substances. Moreover, this design has not
proven to be particularly effective to date, as participants
accurately identify the experimental condition (e.g., MDMA
and psilocybin) ~70–85% of the time (Bershad etal. 2019;
Carbonaro etal. 2018). Thus, even with these more rigorous
approaches, adequate masking remains a challenge. Taken
together, there is a pressing need for methodological inno-
vations that adequately address the problem of masking in
psychedelic studies.
Muthukumaraswamy etal. (2021) made several recom-
mendations for addressing masking in psychedelic clinical
trials. The authors suggested that active placebos may need
to be combined with alternative trial designs (e.g., dose-
response parallel-groups design) as well as some vagueness
about the acute effects of psychedelics when consenting
participants. Dose-response parallel-groups designs com-
pare the full dose of the active treatment drug with a low
dose; the advantages and disadvantages of such an approach
are discussed previously. Vagueness regarding the acute
effects of psychedelics has tradeoffs as well: although it
may improve masking, there are clear ethical concerns as
participants need to be able to give fully informed consent
(Smith and Sisti 2021). This consideration is especially true
with psychedelic studies, as psychedelic experiences have
been described as “life changing” and have the potential to
affect one’s social relationships (Ross etal. 2016), spiritu-
ality (Griffiths etal. 2006), and worldview (Timmermann
etal. 2021). Another recommendation provided was the 2
× 2 balanced placebo design (Rohsenow and Marlatt 1981),
or 2 × 2 factorial design, in which the intervention factor
(psychedelic drug, placebo) and instructional set provided
to each participant (receiving psychedelic drug, receiving
placebo) are systematically crossed with each other. This
design offers a potentially rigorous experimental means for
separating pharmacological effects of the drug from partici-
pant expectations but is most suitable for mechanistic studies
of acute drug effects, rather than clinical trials examining
treatment efficacy. To date, there are no published reports
of this design being used in psychedelic drug research, pos-
sibly because of its high costs (Schenberg 2021). Although
researchers have begun to address the methodological chal-
lenges associated with masking, treatment expectations, and
their combined impact that can bias study results, there is a
need to advance the rigor of future research. We build upon
this work in the next section by elaborating on recommenda-
tions for improving psychedelic clinical trials.
Novel recommendations to improve future research Experi-
mental confounds related to expectancies and placebo effects
in psychedelic studies largely stem from inadequate mask-
ing. Therefore, our recommendations are primarily focused
on how to improve masking in psychedelic trials through a
combination of procedures intended to decrease participants’
confidence in their assigned treatment arm (Fig.3). As our
review of others’ pioneering work makes clear, adequate
masking involves critical decision points at every step in
the lifecycle of a clinical study. Our suggestions follow suit,
noting elements for consideration in study development and
design, participant recruitment and selection, outcomes and
endpoints, study procedures, and analysis plans. It should
be noted that masking is not an all-or-nothing phenomenon;
incorporating a portion of these suggestions can incremen-
tally reduce participants’ confidence in their treatment arm
assignment andthereby attenuate the influence of treatment-
nonspecific factors in interpretations of clinical trials.
Study development anddesign
The choice of a control condition, the number of study arms,
and overall design should be determined by the specific
purpose of the study (Freedland, 2020; Gold etal., 2017).
For example, although an open-label study design does not
mask participants or control for treatment-nonspecific fac-
tors, it may be appropriate when the purpose of the study
is to examine safety, feasibility, or proof-of-concept. If the
purpose is to examine treatment efficacy, inactive control
conditions (e.g., treatment-as-usual, waitlist controls) should
be included at the minimum to control for some treatment-
nonspecific factors, such as natural history or regression to
the mean. A stronger study design to test for efficacy would
include an active control condition, such as an active pla-
cebo that mimics some of the acute effects of a psychedelic.
Including both an active and inactive control condition (i.e.,
3-arm design) is a promising way to disentangle placebo
effects (Fillingim and Price 2005; Smith etal. 2020; Vase
and WartolowVaseska 2019), because 3-arm trial designs
allow for comparisons between both the treatment and the
active placebo conditions with the inactive control condition
to delineate treatment-specific effects from placebo effects
(see Fig.1a). There are also alternative study designs that
may be especially useful because of psychedelic trials’
vulnerability to large placebo effects. Sequential parallel
designs with a placebo run-in period can reduce the size of
placebo effects by excluding “placebo responders” from the
subsequent treatment phase (Campbell etal. 2019; Dworkin
etal., 2010; Ivanova etal.2016; Tamura and Huang 2007).
This alternative design can be implemented in psychedelic
trials by giving all participants an active placebo in the first
phase and then randomly assigning only the participants who
did not respond to the initial treatment (i.e., placebo nonre-
sponders) to the psychedelic or placebo in the second phase.
This placebo run-in period creates a subgroup for analysis
that increases the sensitivity to detect a treatment-specific
effect (Dworkin etal., 2010; Ivanova etal., 2016); however,
a recent systematic review challenges the notion that this
design actually reduces the measured placebo response
(Scott etal. 2021).
We also recommend designing studies with a single
psychedelic administration when possible, given our cur-
rent understanding regarding the efficacy of psychedelic
therapy. There are compelling reasons to believe that mul-
tiple psychedelic dosing sessions may have therapeutic
advantages (Bouso etal. 2013; Leger and Unterwald 2021;
Fig. 3 Recommendations for improving methodology in psychedelic trials. Overview of our recommendations for improving experimental meth-
odology in future clinical trials with psychedelics
1 3
Mithoefer etal. 2019), and this treatment model is very
likely to be adopted in clinical practice if these therapies
become FDA-approved. On the other hand, the current con-
troversies surrounding psychedelic therapy are focused on
whether there is any drug-specific benefit of the complex
therapeutic intervention. The answer to this basic question
is very likely to inform regulatory decisions, cost-effective-
ness models, and coverage by insurers, and is dependent on
adequately masked trials. To that end, studies with only a
single dosing session are likely to be superior in supporting
adequate masking compared to studies with multiple dos-
ing sessions. That is, once participants have experienced
the subjective effects of a substance, they are more likely to
identify that substance if it is readministered or recognize
that a different substance has been given, compromising
the conclusions that can be drawn from the trial (Wilsey
etal. 2016). Therefore, we recommend between-subjects
designs with a single dosing session when evaluating treat-
ment efficacy.
Several trials have included an open-label crossover
component, wherein patients assigned to the inactive
control arm are offered the opportunity to receive open-
label psychedelic therapy after completing the final post-
treatment assessment (Wolfson etal. 2020). Some have
argued that this design feature is ethically mandatory in
order to provide the patient with the best possible chance
of therapeutic response. We disagree with the idea that the
standard of care, or optimal care, involves offering unreg-
ulated and unapproved psychedelic therapy, particularly
when the goal of these trials is to establish the efficacy of
these same interventions. We recommend incorporating
well-established strategies to minimize harm to partici-
pants that may arise if an experimental therapy is either
harmful, or conversely highly effective, rendering placebo
treatment unethical. “Stopping rules” are predefined time
points where an interim analysis for efficacy can be per-
formed to identify these situations and minimize harm.
Alternatively, adaptive randomization based on outcome
(see below) can achieve a similar goal while maintaining
statistical power (Dragalin 2011)3. We also emphasize the
importance of including robust psychotherapeutic support
in any treatment arm when dealing with high-risk popu-
lations selected for treatment resistance, both to maxi-
mize patient safety and monitoring and to better assess
drug-specific enhancement of psychotherapy as discussed
Participant recruitment andselection
We recommend recruiting psychedelic-naive participants
when possible for clinical trials. Masking an individual’s
treatment condition is much more feasible if they have
no prior experience with that substance and are less cer-
tain about what effects to expect (i.e., process expecta-
tions; Tambling 2012; Wilsey etal. 2016). On that basis,
participants should be naive to the active placebo as well.
Ostensibly, psychedelic-naive individuals would have less
confidence as to whether they received the treatment or
active placebo, particularly if the active placebo had hal-
lucinogenic effects. Carbonaro etal. (2018) demonstrated
that experienced hallucinogen users are highly accurate at
differentiating between whether they received psilocybin
or DXM, but those without prior hallucinogen use may be
easier to convince, especially if this strategy is combined
with other recommendations given here (e.g., incomplete
disclosure of study design, between-subjects designs with
a single drug administration). It should be noted, however,
that a challenge with this design is that several psychoac-
tive substances (e.g., cannabis, opioids) are known to elicit
different subjective and behavioral responses in drug-naive
individuals compared to those with past experience (Solowij
etal. 2019). This appears to be the case with psychedelics
too, as demonstrated by a negative relationship between
number of previous psychedelic uses and the intensity of
acute effects (Aday etal. 2021). Thus, the phenomenological
experience and intensity of drug effects may differ in first-
time users, which could limit generalizability. If recruiting
only psychedelic-naive participants is not feasible given the
increasing number of recreational users (Yockey etal. 2020),
then imposing clear exclusion criteria, such as restrictions
on number of lifetime uses or use within the past 12 months,
should be incorporated.
Outcomes, assessments, andendpoints
The choice of outcomes, assessments, and endpoints can
have a large impact on the evaluation of treatment benefit
and overall methodological rigor of psychedelic clinical tri-
als. The primary endpoint for a trial should be well-defined,
reliable, and represent a clinically meaningful outcome of
how a patient feels, functions, or survives (e.g., Fleming and
Powers 2012; US FDA 2009). Outcome measures should
be consistent with expert recommendations or consensus
statements for a given disease or condition under study when
available (e.g., Deyo etal. 2014), and the minimal clinically
important difference in the primary outcome measure that
represents a treatment benefit should be set a priori (e.g.,
Dworkin etal. 2008, 2009). There are unresolved questions
regarding the long-term efficacy of psychedelic therapy.
Lasting, clinically significant improvements following
3 These strategies are complementary to existing mechanisms for
patients to try unapproved therapies, instituted as the Right to Try Act
in the USA, as well as expanded access clinical programs (Holbein
etal. 2015)
psychedelic therapy, regardless of any placebo group differ-
ence, are likely more important to patients, providers, and
stakeholders than an acute improvement that is not main-
tained. However, given the current level of evidence and
controversy regarding the drug-specific efficacy of the treat-
ment, we emphasize the primary importance of rigorous,
well-controlled trials is to define clear evidence of benefit
that outlasts the acute drug effect. The specific timing of
outcomes will depend heavily on the indication under con-
sideration. Although long-term follow-ups provide a more
complete understanding of treatment effects, especially in
trials on chronic conditions, they are still susceptible to
placebo effects and selection bias affecting trials from the
outset. For example, a well-designed, masked RCT showed
that arthroscopic knee surgery was never better than placebo
surgery across 2 years of assessments (Moseley etal. 2002).
We recommend using multiple methods of measurement
to comprehensively examine the effects of psychedelic
therapy in clinical trials. Patient-reported outcomes (PROs)
assess the status of a patient’s health condition (e.g., disease
symptoms, functioning) directly from the patient and are
commonly used as endpoints in clinical trials (Mercieca-
Bebber etal. 2018; US FDA 2009). Including valid, reli-
able, and clinically informative PRO measures is valuable
because they capture patient-centered perceptions of mean-
ingful change and have downstream influence on clinical
decision-making, drug labeling claims, and health policy
(Calvert etal. 2018; Doward etal. 2010). Clinician-admin-
istered assessments or observer reports can also be useful in
psychedelic trials as they avoid potential self-report biases of
PROs; however, these types of assessments are also vulner-
able to methodological issues, such as low interrater reli-
ability and rater bias (Kobak etal. 2007). Therefore, when
feasible, trials should also include objective and reliable
measures, such as biomarkers and/or behavioral tasks that
reflect component processes related to the index pathology.
Two categories of biomarkers recognized by the FDA (Smith
etal. 2017; US FDA 2020) that may be particularly relevant
for psychedelic clinical trials are predictive biomarkers and
surrogate endpoints. Predictive biomarkers indicate whether
certain participants respond differentially to the treatment or
placebo and can be used to stratify randomization on vari-
ables of interest that may maximize the efficiency of a trial
and minimize the risk of exposing additional patients to an
unproven treatment (Strimbu and Tavel 2010). Surrogate
endpoint biomarkers include accurate and well-validated
lab measures or physical signs that reliably predict or stand
in for a clinically meaningful endpoint (e.g., biomarkers of
abstinence; Johnson etal. 2014; Fleming and Powers 2012).
Not all diseases or health conditions have biomarkers that
predict treatment benefit or represent clinical endpoints,
but when available, inclusion of these types of biomarkers
may lead to more efficient trials with less bias (Fleming and
Powers 2012). Because psychedelic clinical trials are par-
ticularly expensive, one must weigh the tradeoffs between
trial costs and participant burden with the addition of bio-
markers, long-term follow-ups, and lengthy assessments.
Study procedures: managing andmeasuring
treatment expectations
Several pragmatic steps can be taken at the beginning stages
of a study to manage participants’ expectation bias. We do
not currently have sufficient data to claim that psychedelic
therapy is an effective treatment; therefore, investigators
should emphasize the uncertainty regarding the treatment
efficacy, rather than insinuating that the treatment will
improve participants’ symptoms (Erpelding etal. 2020;
Evans etal. 2021; Gewandter etal. 2020; Smith etal. 2020).
This communication on the uncertainty of treatment efficacy
should be consistent across recruitment materials, initial
contact with potential participants, consent forms, and any
interactions with participants. Moreover, in trials compar-
ing psychedelic therapy to placebo, drug effects should be
explained neutrally (Smart etal. 1966). For example, par-
ticipants can truthfully be informed about possible drug
effects while also noting that there is significant variability
between people—some people have strong reactions to a
psychedelic while others have very mild reactions (Griffiths
etal. 2016). Similarly, in studies in which both treatment
arms receive psychotherapy, the investigator can honestly
describe psychotherapy as an effective treatment whether
or not it is paired with a psychedelic. To ensure this clinical
equipoise and manage participants’ expectations, all study
staff should be masked to treatment arm assignment and
trained to present the study and arms of the trial neutrally.
In addition to managing expectations, it is important to
measure participants’ treatment expectations. We and oth-
ers (e.g., Muthukumaraswamy etal. 2021) recommend
the use of established measures of expectancy, such as
the Stanford Expectations of Treatment Scale (Younger
etal. 2012), which is a valid and reliable measure of par-
ticipants’ positive and negative treatment expectancies. The
scale includes six items that can easily be adapted across
research contexts to identify differences in expectancies
between treatment groups as well as relationships between
treatment expectancies and outcomes. The Credibility and
Expectancy Questionnaire (Devilly and Borkovec 2000) can
also be used to measure the degree to which a participant
thinks and feels the treatment will improve their symptoms
or functioning. Furthermore, several face-valid questions,
such as “how helpful do you believe the treatment will be
for improving your [primary symptom]?”, have been used
successfully to measure treatment expectations in previ-
ous research (e.g., Sherman etal. 2010). Another option
is to conduct semi-structured interviews, possibly during
1 3
participant preparation and integration sessions, and use
qualitative analyses to assess participants’ positive and
negative treatment expectations (e.g., Eaves etal. 2015).
Because of the aforementioned issues with unmasking fol-
lowing a psychedelic session, and the interaction between
masking and expectations, it may be useful to measure treat-
ment expectations after the drug dosing session in addition
to those at baseline. Arguably, expectations at baseline may
be predictive of subjective effects during the psychedelic
session, and expectations at post-session may be predictive
of changes in clinical outcomes. This speculation remains to
be tested, but it is worthwhile to systematically evaluate the
natural dynamics of expectations during psychedelic trials
and examine whether expectations change after the dosing
Study procedures: incomplete disclosure
We have reviewed studies where incomplete disclosure has
been used to reduce participants’ certainty regarding their
treatment assignment (Bershad etal. 2019; Carbonaro etal.
2018; Griffiths etal. 2006; Reissig etal. 2012). In designing
a trial, it is critically important to distinguish “incomplete
disclosure” from “deception.” Most institutional review
boards have internally defined these respective procedures;
however, “deception” is generally agreed to mean that the
investigators provide false information to a participant
whereas “incomplete disclosure” indicates that the subject is
not fully informed about the purpose or design of the study.
These strategies are controversial—the ethics of omitting
important information about a study and misleading par-
ticipants is an area of ongoing debate (Miller etal. 2005;
Roulet etal. 2017). Implementing any deceptive practice
requires thorough scientific justification and authoriza-
tion by institutional review boards. Empirical evidence in
healthy adults suggests that research participants may not be
adversely affected by deception (Mundt etal. 2017); how-
ever, in the context of clinical trials in which therapeutic
alliance is critical for patient safety and treatment efficacy,
deception may be particularly ill-advised. If it is considered
ethically appropriate, though, withholding information from
participants as well as study staff about the number of study
arms and the exact doses administered may be particularly
effective for enhancing masking success. Providing a vague,
incomplete description of the study structure and a range of
possible dosages may be best suited for standard, two-armed
RCT designs (and avoids the need to use an alternative study
design that requires a significantly larger sample size for
adequate statistical power). Without the cues of knowing
that it is only possible to receive the experimental treat-
ment or placebo (e.g., a high dose or an ultra-low dose of
a psychedelic), it may be difficult for both the participant
and staff to develop a firm belief about the participant’s
treatment condition. Similarly, listing the side effects of
all of the potential study drugs together—instead of list-
ing effects specific to each substance—may be an ancillary
strategy to reduce participants’ confidence in their treatment
arm assignment while still fully informing them of all the
drug effects they may be exposed to (Boutron etal. 2006).
In a recent study with 5-MeO-DMT, researchers withheld
the identity of the study drug but informed participants that
they would be receiving a tryptamine psychedelic (Reck-
weg etal. 2021); this may be a useful method for managing
expectations in cases where participants could have distinct
expectations regarding specific psychedelic substances. A
related recommendation to improve methodological rigor
in the field is for researchers to report what drug effects
participants were informed about prior to the study.
Incomplete disclosure to participants and study person-
nel regarding key elements of a study’s design may help to
meet a central objective of masking: establishing “a state of
ambivalence” about treatment allocation to minimize the
impact of beliefs on study outcomes (Mathieu etal. 2014).
Ensuring that study staff receive the same information as
participants and remain unaware of the true design through-
out the study is critical, as feedback from observers is known
to influence participants’ clinical outcomes (Colagiuri and
Boakes 2010; Hróbjartsson etal. 2012). It is important to
acknowledge that undertaking this effort—concealing funda-
mentals of study design from staff as well as participants—is
challenging from a practical standpoint, requiring careful
management of access to information about the study (e.g.,
a “cone of silence”). Using incomplete disclosure or decep-
tion also necessitates appropriate debriefing protocols, as
well as development of masking assessments that avoid
revealing the true study design. Most assessment tools in the
clinical trial literature measure perceived treatment assign-
ment as nominal data and implicitly indicate study design
(i.e., “Do you think you received the active treatment or
placebo?”). Probing participants’ and staff members’ beliefs
using ordinal/parametric scales may not only allow inves-
tigators to maintain uncertainty about the design, but also
has the advantage of increasing statistical power (Laferton
etal. 2017).
Study procedures: active placebo
Use of an active placebo has a clear rationale for psychop-
harmacology studies. However, as reviewed above, efforts
to mask the unique subjective effects of psychedelics have
had limited success. Our choices are largely constrained by
a limited understanding of how psychedelics produce thera-
peutic benefits. For example, a drug that mimics psychedelic
effects but provides no therapeutic benefit could potentially
be an excellent active placebo. However, the internal con-
tradiction in this strategy becomes apparent if, as several
1 3
researchers argue (Yaden and Griffiths 2020), the subjec-
tive effects produced by psychedelics (particularly mystical
states) themselves drive therapeutic benefit. Although intui-
tive, this hypothesis is nonetheless unproven and a thorough
evaluation is beyond the scope of this review; we instead
refer the reader to an excellent summary of arguments for
and against this idea (Olson 2020; Yaden and Griffiths
2020). We anticipate that future research will clarify whether
mystical states induced by means other than psychedelics
such as hypnosis (Lynn and Evans 2017), holotropic breath-
work (Puente 2014), meditation (Russ and Elliott 2017),
virtual reality (Glowacki etal. 2020), or non-psychedelic
psychoactive drugs (Earleywine etal. 2021) are sufficient
for therapeutic effects observed in psychedelic therapy tri-
als, such as smoking cessation and symptomatic relief from
depression in appropriate target populations.
A deeper understanding of the neural systems and neu-
rochemistry required for psychedelics’ therapeutic effects
may lead to highly effective comparators for use in clinical
trials. A recent clinical study investigating the antidepressant
mechanism of ketamine illustrates that the acute subjective
effects of a psychedelic-class drug may be separable from its
therapeutic effects. Williams etal. (2018, 2019) found that
a high dose of an opioid antagonist, naltrexone, effectively
blocked ketamine’s antidepressant and anti-suicidal effects
but had a minimal impact on ratings of ketamine-induced
dissociation. This small study was met with some contro-
versy (Heifets etal. 2019; Marton etal. 2019; Yoon etal.
2019) and requires replication in a larger independent sam-
ple. Also, notably, the authors did not formally assess mask-
ing efficacy in the respective treatment conditions. None-
theless, these findings suggest a powerful active placebo
comparator for future studies of ketamine, and potentially
other psychedelics. Similarly, for classical psychedelics like
psilocybin, pharmacological agents may be discovered that
interrupt neuroplastic processes triggered by psilocybin, but
do not interfere with its acute psychedelic effects. Another
highly innovative approach in development (NCT04842045)
pairs psilocybin with an amnestic drug (midazolam, a ben-
zodiazepine). This study is focused on safety. The broader
hypothesis, yet to be tested, is that psychedelic and mystical
states evoked in participants who do not form memories of
the experience are not therapeutic, likely because partici-
pants’ amnesia prevents subsequent therapeutic integration
of the psychedelic experience. An alternate outcome may
be that participants do experience therapeutic benefit, but
are effectively masked to their assigned treatment condi-
tion by virtue of midazolam-induced amnesia. In this case,
a near-perfectly controlled, masked study design is achieved,
with an easily interpretable finding for psilocybin’s efficacy,
uncomplicated by differential placebo or nocebo effects
in patients receiving midazolam alone versus midazolam
plus psilocybin. We eagerly anticipate results from this
pioneering line of inquiry and note several challenges. In
addition to the ethical considerations of using amnestic
agents in psychiatric populations, there are technical consid-
erations that may confound this approach, including uncer-
tainty as to whether midazolam retains its amnestic property
when paired with a psychedelic, whether amnestic doses of
midazolam produce a degree of sedation that precludes entry
into a mystical state, or whether midazolam directly blocks
therapeutic psychological or neural mechanisms induced by
psychedelic medications.
Psychedelic therapy may be an uninterruptible whole,
requiring the drug, psychedelic experience, and associated
psychotherapy to achieve any therapeutic benefits (Sessa
2014). In this case, which should be assumed true until
proven otherwise, there is still a pragmatic need to iden-
tify pharmacological and somatic placebo treatments that
adequately mask psychedelic effects. Although we have no
evidentiary basis to recommend specific active placebos
beyond those that have been attempted, substances with
hallucinatory effects (e.g., ketamine, DXM, and high doses
of tetrahydrocannabinol) may be compelling options, espe-
cially when combined with drug-naive participants. We
strongly support studies specifically devoted to developing
and testing active placebos for use in therapeutic clinical
trials. The need to develop active placebos for participants
with past psychedelic use is particularly important given the
likely decrease in psychedelic-naive participants that can be
recruited for clinical therapeutic studies in the coming years.
Design of an active placebo ought to be considered in
concert with other study design elements described above,
with the overarching goal of reducing a prospective study
participant’s certainty of their treatment condition. For
example, if testing psilocybin’s efficacy for major depres-
sive disorder, investigators may combine active placebo and
incomplete disclosure to balance expectancy effects across
treatment arms. For simplicity, the study could be designed
as a two-arm comparison of high-dose psilocybin versus
ultra-low-dose (ineffective) psilocybin plus an active pla-
cebo. During the informed consent process, participants
would truthfully be informed that they will receive a range
of psilocybin doses and may also receive an active placebo,
with full disclosure that the purpose of the active placebo
is to reduce their certainty of treatment assignment. The
number of study arms (two, in fact) and the likelihood that
their assigned psilocybin dose would be effectively non-
therapeutic would not be disclosed. Furthermore, informed
consent could include information that subthreshold (but not
ultra-low) psilocybin may have therapeutic value, although,
again, it would not be disclosed that no participants would
be assigned to a subthreshold dose group. In this case, the
specific goal of an active placebo might be to mimic aspects
of a high-dose psilocybin dose, which could be achieved
with DXM or perhaps a combination of a benzodiazepine
1 3
and a mild stimulant. Taken together, participants would be
informed of all the possible treatment conditions and may
be reasonably uncertain as to whether they received a high
therapeutic dose of psilocybin versus an ultra-low dose plus
active placebo.
Analysis: assessing andreporting outcomes related
totrial design
The set of treatment-nonspecific effects, collectively termed
“the placebo effect,” and effective masking are key consid-
erations for designing an interpretable study involving psy-
choactive drugs. Anticipating the placebo effect, measuring
the contribution of expectancies, assessing the effectiveness
of masking, and systematically reporting these data will set
standards and lead to iterative improvements in trial design.
These factors ought to be considered at every step in the
lifecycle of a clinical study. We specifically recommend
calculating statistical power based on known placebo effect
sizes, obtaining repeat baseline measures of the primary
outcome(s), measuring expectancies and masking success,
and analyzing primary outcomes using expectancy and per-
ceived (rather than actual) treatment arm as covariates.
Estimating the size of the placebo effect informs statisti-
cal power calculations, which, if resources are limited, may
impact the feasible number of treatment arms. A common
method of estimating the size of the placebo effect in a trial
is to compare outcomes in the placebo arm to a “no treat-
ment” arm (Hróbjartsson and Gøtzsche 2010; Wampold
etal. 2016). However, given the previously discussed “hype”
around psychedelics, participants randomly assigned to the
“no treatment” arm would likely experience disappointment
and nocebo effects from their knowledge of not being in the
active treatment. An alternative method of partitioning the
placebo effect from the treatment effect may be to compare
against a “placebo benchmark” (Jones etal. 2021). Jones and
colleagues found that the effect size of the placebo effect was
uniform across different treatment approaches for depression
(pooled Hedge’s g = 1.05). In areas where the size of the
placebo effect has been well-established, researchers may
be able to compare their anticipated effect size against a
criterion. Investigators can also take simple steps to mini-
mize some components of the placebo effect, such as regres-
sion to the mean. We recommend that investigators perform
repeat baseline assessment of their outcome of interest and
only enroll participants with stable response characteristics.
This procedure may be more cost-effective than including
an untreated control condition to estimate regression to the
We strongly recommend measuring the factors that
make up the placebo effect. Prior to conducting any study
procedures (e.g., preparation sessions), participants’ treat-
ment expectations should be measured as described above.
Measuring masking efficacy is similarly important and
should be appropriately timed. In many cases, the clinical
benefits of psychedelics may be rapid (Majić etal. 2015;
Murphy-Beiner and Soar 2020). We recommend measur-
ing participant- and therapist-perceived treatment alloca-
tion, certainty of treatment allocation, and the reason for
their guess both immediately after the psychedelic dosing
session(s) and at the end of the study.
Including two measurement occasions may help deter-
mine whether participants and therapists guessed the treat-
ment allocation based on the subjective effects during the
treatment session or from changes in clinical symptoms over
time (Katz, 2021; Kolahi etal. 2009). We agree with Katz
(2021) that accurate guesses of treatment allocation due to
treatment efficacy should not be considered unmasking. To
further redress the influence of masking, we suggest using
clinical assessors who are unaware of the study design and
participant treatment allocation to collect all relevant meas-
ures. Clinical assessors should also be asked about perceived
participant treatment allocation at the end of the study (Katz
2021). We again emphasize that investigators should create
protocols and adherence plans for all relevant study staff to
maximize the chances that masking is maintained through-
out the study.
Participant expectations and functional unmasking may
be unavoidable sources of bias that impact internal valid-
ity and the inferences that can be drawn from study results
(Higgins etal., 2011; Kolahi etal., 2009). However, modern
adaptive trial designs can help investigators at least achieve
an even distribution of these biases across conditions. A
thorough discussion of adaptive designs is beyond the scope
of this review, and we refer the reader to two useful sum-
maries, including draft guidance from the FDA on adaptive
trial design for industry (FDA 2019; Pallmann etal. 2018).
In short, investigators may consider using expectancy and
participant-assessed treatment conditions to create balanced
randomization blocks (i.e., covariate-adaptive treatment
assignment) just as other clinical trials stratify recruitment
on the prevalence of comorbidities, sex, and other factors
that may differentially impact treatment outcomes. For small
exploratory trials, it may not be possible to balance on mul-
tiple pre-treatment variables; therefore, the decision to bal-
ance recruitment on treatment outcome expectations must
be weighed against other recruitment priorities.
A major benefit of measuring expectancies and mask-
ing efficacy is that these factors can be used as covariates
in the analysis of primary study outcomes, and the specific
effects of expectancy and treatment arm guess on outcome
can be evaluated. In the previously discussed microdosing
study by van Elk etal. (2021), researchers initially found that
microdoses of psilocybin led to greater ratings of awe than
placebo; however, after adding baseline expectations as a
covariate to the analyses, the difference between conditions
1 3
was non-significant. In a study that employs an effective
active placebo, outcomes can be analyzed according to the
drug that participants think they received compared to the
drug they actually received. In a study measuring pleasant-
ness of affective touch, Bershad etal. (2019) found a sig-
nificant effect of MDMA compared to an active placebo,
methamphetamine. A substantial number of participants
who received methamphetamine believed they had received
MDMA (38.9%). Analyzing outcomes using a participant’s
guess as a covariate showed no effect in this latter group.
This comparison strongly reinforced the authors’ conclu-
sion that the effect of MDMA on affective touch was drug-
specific and not a product of participants’ expectations.
Beyond thescope ofRCTs
One notion to consider is embracing expectancy and placebo
effects. The important role of expectancies in psychedelic
therapy blurs the line between treatment-specific and treat-
ment-nonspecific effects and raises the broader question:
rather than eliminating treatment-nonspecific effects, should
trialists be looking for ways to optimize and synergize them
with treatment interventions to enhance clinical outcomes
(Colloca and Barsky 2020; Enck etal. 2013)? Although no
formalized manual exists on how to boost expectancy in psy-
chotherapy, inducing positive expectations has been shown
to enhance the effectiveness of a variety of health interven-
tions (Bingel etal. 2011; Flowers etal. 2018; Kaptchuk etal.
2020), a strategy which could seemingly be tailored to—and
be particularly synergistic with—psychedelic treatments as
well. As discussed previously, placebo and drug-specific
effects are likely to be interactive rather than additive (Kube
and Rief 2017). Thus, it may be the case that the “therapeu-
tic window” opened by psychedelics is an emergent property
of a complex system comprising expectations, drug effects,
setting, and therapeutic alliance. It may be impossible to
isolate an individual component of this complex package
in an RCT. Critically, this does not condemn psychedelic
therapy as being no more effective than placebo, but means
that the current gold standard clinical trial design may not be
sensitive to detecting the therapeutic effect of an individual
treatment element.
A potential solution to this dilemma may be to shift focus
from efficacy trials and the use of explanatory or confirma-
tory RCT designs towards pragmatic clinical trial designs
(PCTs) that have an alternative goal of assessing treatment
effectiveness. Whereas internal validity (i.e., objective com-
parison of drug vs placebo in tightly controlled settings with
homogenous groups) is the major objective of an explana-
tory or confirmatory trial, external validity and the gener-
alizability of treatment effectiveness are the primary focus
of a well-designed PCT. Consequently, PCTs offer potential
“real-world” tests of clinical effectiveness and the generaliz-
ability of outcome data, rather than isolation of the active
ingredient for change. To achieve these goals, PCTs typically
include one or more alternative therapies to the treatment
under study, rather than active or inactive placebos, and
participants are normally recruited from a broad “real-life”
clinical population, with few exclusions or restrictions on
participation. Although pragmatic trials are normally con-
ducted in the fourth, post-marketing phase of drug develop-
ment, Carhart-Harris etal. (2021) have argued cogently for
the potential benefits of pragmatic designs being used earlier
to broadly assess the clinical effectiveness of current psy-
chedelic treatments, either as an alternative or complement
to the much narrower focus of current RCTs.
Lastly, a closely related approach to consider when testing
the effectiveness of psychedelic therapy is to evaluate large-
scale population data using so-called “natural experiments.”
Natural experiments provide an alternative to RCTs by tak-
ing advantage of circumstances whereby naturally occur-
ring events can be linked to variables of interest (Thapar
and Rutter 2019). This type of design is necessary when
randomly assigning individuals to masked conditions is not
possible because of ethical or logistical constraints, such as
when studying maltreatment or child neglect (Rutter 2007).
If the challenges related to expectations and masking with
psychedelics preclude rigorous RCTs, natural experiments
may be another method of evaluating the treatment’s effects.
With the recent legalization of psilocybin therapy in Oregon
as well as successful decriminalization movements across
the USA (Aday etal. 2020a; Marks and Cohen 2021), it
is possible that objective indices related to mental health
(e.g., suicide rates, emergency room visits for psychiatric
issues) could precipitously decrease at the population level
if psychedelics are indeed an effective treatment for a variety
of psychiatric conditions. Although it is unclear what the
initial accessibility of these treatments will be to individu-
als in states such as Oregon (Williams and Labate 2020), if
positive trends in mental health are observed at the popula-
tion level after the introduction of legal psychedelic therapy,
the role of expectations may be considered immaterial to the
broader benefits to society.
Accurate detection of treatment-specific effects in clinical
trials is an intrinsically complex task across areas of research
as study personnel and participant expectations interact
dynamically with masking and therapeutic outcomes. Psy-
chedelic studies are particularly challenging as they must
address additional confounds related to “hype” and salient
psychoactive effects that hinder treatment arm masking to
an extensive degree. On one hand, to characterize clinical
1 3
efficacy and safety, it is an essential challenge for the field
to separate pharmacological effects from multiple, interac-
tive socio-psychological influences in psychedelic medicine.
Innovative, disruptive experimental designs may be needed
to this end. On the other hand, at a practical level, it is
important from a public health standpoint to identify meth-
ods of optimizing psychedelic treatment outcomes, perhaps
by utilizing expectancies. These results could potentially
guide clinical decision-making.
Traditional placebo masking with inert comparators
is insufficient for high-dose psychedelic studies, and this
review highlights that this issue often extends to psycho-
therapy and pharmacology research more broadly. Here,
recommendations are presented for improving the method-
ological rigor of future psychedelic studies that addresses
issues related to expectations and participant masking.
Specifically, we provide guidelines on study design (e.g.,
incomplete disclosure of treatment arms, neutral explana-
tion of drug effects), participant recruitment and selection
(e.g., include psychedelic- and active placebo-naive partici-
pants), outcomes and endpoints (e.g., include biomarkers
and behavioral measures), control conditions (e.g., use active
comparators), and analyses (e.g., test masking efficacy, con-
trol for pre-treatment expectations, compare against placebo
benchmark). Although these recommendations are tailored
to psychedelic studies, they can be incorporated into psycho-
therapy and pharmacology research more broadly to increase
precision in identifying treatment-specific effects. Doing so
may improve methodological rigor and identification of
effective interventions across areas of medicine.
