Psychological scientists have recently started to reconsider the importance of close replications in building a cumulative knowledge base; however, there is no consensus about what constitutes a convincing close replication study. To facilitate convincing close replication attempts we have developed a Replication Recipe, outlining standard criteria for a convincing close replication. Our Replication Recipe can be used by researchers, teachers, and students to conduct meaningful replication studies and integrate replications into their scholarly habits.
The Replication Recipe: What makes for a convincing replication?
Mark J. Brandt
, Hans IJzerman
, Ap Dijksterhuis
, Jason Geller
Roger Giner-Sorolla
, James A. Grange
, Marco Perugini
, Jeffrey R. Spies
, Anna van 't Veer
Tilburg University, Netherlands
Radboud University Nijmegen, Netherlands
University of Washington, USA
Iowa State University, USA
University of Kent, UK
Keele University, UK
University of Milano-Bicocca, Italy
Center for Open Science, USA
TIBER (Tilburg Institute of Behavioral Economics), Netherlands
Close replications are an important part of cumulative science.
Yet, little agreement exists about what makes a replication convincing.
We develop a Replication Recipe to facilitate close replication attempts.
This includes the faithful recreation of a study with high statistical power.
We discuss evaluating replication results and limitations of replications.
Statistical power
Research method
Solid Science
Psychological scientists have recently started to reconsiderthe importance of close replications in building a cu-
mulative knowledge base; however, there is no consensus about what constitutes a convincing close replication
study. To facilitate convincing close replication attempts we have developed a Replication Recipe, outlining stan-
dard criteria for a convincing close replication. Our Replication Recipe can be used by researchers, teachers, and
students to conduct meaningful replication studies and integrate replications into their scholarly habits.
Replicability in research is an important component of cumulative
science (Asendorpf et al., 2013; Jasny, Chin, Chong, & Vignieri, 2011;
Nosek, Spies, & Motyl, 2012; Rosenthal, 1990; Schmidt, 2009), yet
relatively few close replication attempts are reported in psychology
(Makel, Plucker, & Hegarty, 2012). Only recently have researchers
systematically reported replications online (e.g.,, and experimented with special issues to
incorporate replications into academic publications (e.g., Nosek &
Lakens, 2013; Zwaan & Zeelenberg, 2013). Moreover, some prestigious
psychology journals (e.g., Journal of Experimental Social Psychology,
Journal of Personality and Social Psychology,Psychological Science) are re-
cently willing to publish both failed and successful replication attempts
(e.g., Brandt, 2013; Chabris et al., 2012; LeBel & Campbell, in press;
Matthews, 2012; Pashler, Rohrer, & Harris, in press)andeven
devote ongoing sections to replications (see the new section in Perspec-
tives on Psychological Science,Registered replication reports, 2013).
From initial conclusions drawn from replication attempts of
important ndings in the empirical literature, it is clear that replica-
tion studies can be quite controversial. For example, the failure of re-
cent attempts to replicate social primingeffects (e.g., Doyen, Klein,
Pichon, & Cleeremans, 2012; Pashler et al., in press) has prompted
psychologists and science journalists to raise questions about the en-
tire phenomenon (e.g., Bartlett, 2013). Failed replications have
sometimes been interpreted as 1) casting doubt on the veracity of
an entire subeld (e.g., candidate gene studies for general intelli-
gence, Chabris et al., 2012); 2) suggesting that an important compo-
nent of a popular theory is potentially incorrect (e.g., the status-
legitimacy hypothesis of System Justication Theory, Brandt, 2013); or
3) suggesting that a new nding is less robust than when rst intro-
duced (e.g., incidental values affecting judgments of time; Matthews,
2012). Of course, there are other valid reasons for replication failures:
Chance, misinterpretation of methods, and so forth.
Nevertheless, not all replication attempts reported so far have been
unsuccessful. Burger (2009) successfully replicated Milgram's famous
obedience experiments (e.g., Milgram, 1963), suggesting that when
well-conducted replications are successful they can provide us with
greater condence about the veracity of the predicted effect. Moreover,
replication attempts help estimate the effect size of a particular effect
and can serve as a starting point for replicationextension studies that
further illuminate the psychological processes that underlie an effect
and that can help to identify its boundary conditions (e.g., Lakens,
2012; Proctor & Chen, 2012). Replications are therefore essential for
theoretical development through conrmation and disconrmation of
results. Yet there seems to be little agreement as to what constitutes an
appropriate or convincing replication, what we should infer from replica-
tion failuresor successes,and what close replications mean for psy-
chological theories (see e.g., the commentary by Dijksterhuis, 2013 and
the reply by Shanks & Newell, 2013). In this paper, we provide our Rep-
lication Recipe for conducting and evaluating close replication attempts.
Close replication attempts
In general, how can one dene close replication attempts? The most
concrete goals are to test the assumed underlying theoretical process,
assess the average effect size of an effect, and test the robustness of an
effect outside of the lab of the original researchers by recreating the
methods of a study as faithfully as possible. This information helps
psychology build a cumulative knowledge base. This not only aids the
construction of new, but also the renement of old, psychological theo-
ries. In the denition of our ReplicationRecipe, close replications refer to
those replications that are based onmethods and procedures as close as
possible to the original study. We use the term close replications
because it highlights that no replications in psychology can be absolute-
ly director exactrecreations of the original study (for the basis of
this claim see Rosenthal, 1991; Tsang & Kwan, 1999). By denition
then, close replication studies aim to recreate a study as closely as
possible, so that ideally the only differences between the two are the
inevitable ones (e.g., different participants; for more on the benets of
close replications see e.g., Schmidt, 2009; Tsang & Kwan, 1999).
The Replication Recipe
What constitutes a convincing close replication attempt, and how
does one evaluate such anattempt? This is what the Replication Recipe
seeks to address. The Replication Recipe is informed by the goals of a
close replication attempt: Accurately replicating methods and estimat-
ing effect sizes and evaluating the robustness of the effect outside the
lab of origin. Our discussion is based on a synthesis of our own trials
and errors in conducting replications and guidelines recently developed
for special issues and sections of psychology journals (Nosek & Lakens,
2013; Open Science Collaboration, 2012; Registered replication
reports, 2013; Zwaan & Zeelenberg, 2013). In this synthesis, we make
explicit theexpectations and necessary qualities of a convincingreplica-
tion that can be used by researchers, teachers, and students when de-
signing and carrying out replication studies.
A convincing close replication par excellence is executed rigorously
by independent researchers or labs and includes the following ve addi-
tional ingredients:
1. Carefully dening the effects and methods that the researcher in-
tends to replicate;
2. Following as exactly as possible the methods of the original study
(including participant recruitment, instructions, stimuli, measures,
procedures, and analyses);
3. Having high statistical power;
4. Making complete details about the replication available, so that
interested experts can fully evaluate the replication attempt (or
attempt another replication themselves);
5. Evaluating replication results, and comparing them critically to the
results of the original study.
Each of these criteria is described and justied below. We present
and explain 36 questions that need to be addressed in a solid replication
(see Table 1
). This list of questions can be used as a checklist to guide
the planning and communication of a study and will help readers and
reviewers to evaluate the replication, by understanding the decisions
that a replicator has made when designing, conducting, and reporting
their replication. These questions are intended to help replicators follow
the Replication Recipe and determine when and why they have deviat-
ed from the ve Replication Recipe ingredients.
Ingredient #1: Carefully dening the effects and methods that the re-
searcher intends to replicate
Prior to conducting a replication study, researchers need to carefully
consider the precise effect they intend to replicate (Questions 19),
including the size of the original effect (Question 3), the effect size's
condence intervals (Question 4) and the methods used to uncover it
(Questions 59). Although this can be a straightforward task, in many
studies the effect of interest may be a specic aspect of a more compli-
cated set of results. For example, in a 2 × 2 design where the original
effect was a complete cross-over interaction, such that an effect was
positive in one condition and negative in the other, the effect of interest
may be the interaction, the positive and negative simple effects, or per-
haps just one of the simple effects. On other occasions, the information
about the methods used to obtain the effect will be unclear (e.g., the
original country the study was completed in, Question 7); in these
cases, it may be necessary to ask the original authors to provide the
missing information or to make an informed guess. It is important to
know the precise effect of interest from the beginning of the design-
phase of the replication because it determines nearly all of the decisions
that follow. A related consideration, especially when resources are
limited, is the importance and necessity of replicating a particular effect
(Question 2). Such decisions to replicate or not should be based on
either the effect's theoretical importance to a particulareld or its direct
or indirectvalue to society. Another consideration is existing condence
in the reliability of the effect; an effect with a number of existing close
replications in the literature may be less urgent to replicate than one
without any such support (see discussion of the Replication value
project, 20122013). In other words, not every study is worth replicat-
ing. By considering the theoretical and practical importance of a nding
the best allocation of resources can be made.
Ingredient #2: Following exactly the methods of the original study
Once a study has been chosen for replication, and the precise effect
of interest has been identied, the design of the replication study can
commence. In an ideal world, the methods of the original study
(including participant recruitment, instructions, stimuli, measures, pro-
cedures, and analyses) will be followed exactly; however, our prefer-
ence for the term close replicationreects the fact that this
ingredient is impossible to achieve perfectly, given the inevitable tem-
poral and geographical differences in the participants available to an in-
dependent lab (for a similar point see Rosenthal, 1991; Tsang & Kwan,
Nonetheless, the ideal of an exactreplication should be the
starting point of all close replication attempts and deviations from an
exact replication of the original study should be minimized (Questions
1014), documented, and justied (Questions 1725). Below we make
recommendations for how to best achieve this goal and what can be
done when roadblocks emerge.
To facilitate Ingredient #2 of the replication, researchers should start
with contacting the original authors of the study to try and obtain the
original materials (Question 10). If the original authors are not
cooperative or if they are unavailable (e.g., have left academia and can-
not be contacted, or if they have passed away), the necessary methods
should be recreated to the best of the replicator researchers' ability,
based on the methods section of the original article and under the as-
sumption that the original authors conducted a highly rigorous study.
For example, if replication authors are unable to obtain the reaction
time windows or stimuli used in a lexical decision task, they should fol-
low the methods of the original article as closely as possible and to ll in
the gaps by adopting best practices from research on lexical decision
tasks. In these cases, the replication researchers should then also seek
the opinionof expert colleagues in therelevant area to providefeedback
as to whether the replication study accurately recreates the original
article's study as described.
In other cases, the original materials may not be relevant for the rep-
lication study. For example, studies about Occupy Wall Street protests,
the World Series in baseball, or other historically- and culturally-
bound events are not easily closely replicated in different times and
places. In these cases the original materials should be modied to try
and capture the same psychological situation as the original experiment
(e.g., replicate the 2012 elections with the 2016 elections, or present
Except, perhaps, whenthe data of a single experiment are randomly divided into two
equal parts
Table 1
A 36-question guide to the Replication Recipe.
The Nature of the Effect
1. Verbal description of the effect I am trying to replicate:
2. It is important to replicate this effect because:
3. The effect size of the effect I am trying to replicate is:
4. The condence interval of the original effect is:
5. The sample size of the original effect is:
6. Where was the original study conducted? (e.g., lab, in the eld, online)
7. What country/region was the original study conducted in?
8. What kind of sample did the original study use?(e.g., student, Mturk, representative)
9. Was the original study conducted with paper-and-pencil surveys, on a computer, or something else?
Designing the Replication Study
10. Are the original materials for the study available from the author?
a. If not, are the original materials for the study available elsewhere (e.g., previously published scales)?
b. If the original materials are not available from the author or elsewhere, how were the materials created
for the replication attempt?
11. I know that assumptions (e.g., about the meaning of the stimuli)in the original study
will also hold in my replication because:
12. Location of the experimenter during data collection:
13. Experimenter knowledge of participant experimental condition:
14. Experimenter knowledge of overall hypotheses:
15. My target sample size is:
16. The rationale for my sample size is:
Documenting Differences between the Original and Replication Study
For each part of the study indicate whether the replication study is Exact, Close, or Conceptually Different
compared to the original study. Then, justify the rating.
17. The similarities/differences in the instructions are: [Exact | Close | Different]
18. The similarities/differences in the measures are: [Exact | Close | Different]
19. The similarities/differences in the stimuli are: [Exact | Close | Different]
20. The similarities/differences in the procedure are: [Exact | Close | Different]
21. The similarities/differences in the location (e.g., lab vs. online; alone vs. in groups) are: [Exact | Close | Different]
22. The similarities/differences in remuneration are: [Exact | Close | Different]
23. The similarities/differences between participant populations are: [Exact | Close | Different]
24. What differences between the original study and your study might be expected to inuence the size and/or
direction of the effect?:
25. I have taken the following steps to test whether the differences listed in #24 will inuence the outcome of my
replication attempt:
Analysis and Replication Evaluation
26. My exclusion criteria are (e.g., handling outliers, removing participants from analysis):
27. My analysis plan is (justify differences from the original):
28. A successful replication is dened as:
Registering the Replication Attempt
29. The nalized materials, procedures, analysis plan etc of the replication are registered here:
Reporting the Replication
30. The effect size of the replication is:
31. The condence interval of the replication effect size is:
32. The replication effect size [is/is not] (circle one) signicantly different from the original effect size?
33. I judge the replication to be a(n) [success/informative failure to replicate/practical failure to
replicate/inconclusive] (circle one) because:
34. Interested experts can obtain my data and syntax here:
35. All of the analyses were reported in the report or are available here:
36. The limitations of my replication study are:
219M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
British participants with a cricket rather than baseball championship).
In such cases, the most valid replication attempt may actually entail
changing the stimulus materials to ensure that they are functionally
To ensure that the modied materials effectively capture
the same constructs as the original study they can (when possible) be
developed in collaboration with the original authors and the research
community can be polled for their input (via e.g., professional discus-
sion forums and e-mail lists). In some cases, depending on the severity
of the change, it will be necessary to conduct a pilot study, testing the
equivalence of manipulations and measures to constructs tested in the
original research prior to the actual replication attempt. The justica-
tions or steps taken to ensure that the assumptions about the meaning
of the stimuli hold in thereplication attempt should be clearly specied
(Question 11).
Although there is no single conclusive replication (or original study
for that matter), andnosuch burden shouldbe put on an individual rep-
lication study, the replication researcher should do his or her best to
minimize the differences between the replication and the original
study and identify what these differences are. Questions 1723 ask
replicators to categorize which parts of the study are exactly the same
as, close to, or conceptually different from the original study and to
then justify the differences. All of these are imperfect categories that
exist along a continuum, but this categorization task yields at least
three benets. First, reviewers, readers, and editors can judge for them-
selves whether or not they think that the deviation from the original
study was justied. In some cases, a deviation will be clearly justied
(e.g., using a different, but demographically similar, sample of partici-
pants), whereas in other cases it may be less clear-cut (e.g., replicating
a non-internet computer-based lab study done in cubicles on the inter-
net). Second, by identifying differences between replication andoriginal
studies (sample, culture, lab context, etc.) researchers and readers can
identify where the replication is on the continuum from closeto
conceptual.Third, after multiple replication attempts have been re-
corded, these deviations can be used to determine relevant boundary
conditions on a particular effect (for more elaboration on this point
see Greenwald, Pratkanis, Leippe, & Baumgardner, 1986; IJzerman,
Brandt, & van Wolferen, 2013).
In the process of identifying and justifying deviations from the orig-
inal study, replicators shouldanticipate differences between the original
and replication study that may inuence the size and direction of the ef-
fect and testthese possibilities (Question 24). For example, studies have
revealed that people of varying social classes have different psycholog-
ical processes related to the perception of threat, self-control, and per-
spective taking (among other things; e.g., Henry, 2009; Johnson,
Richeson, & Finkel, 2011; Kraus, Piff, Mendoza-Denton, Rheinschmidt,
& Keltner, 2012). Similarly, people process a variety of information dif-
ferently when they are in a positive or negative mood (for reviews
Forgas, 1995; Rusting, 1998). Conducting a replication at a university
(or online) drawing students from different socioeconomic strata
(SES) than the original population or in circumstances where partici-
pants tendto bein adifferent mood than the participants in the original
study (e.g., immediately prior to mid-term exams compared to the
week after exams) may affect the outcome of the replication. In this
case, an individual difference measure of SES or mood could be included
at the end of the replication study so as to not interfere with the close
replication of the original study. Then, a statistical moderator test within
the replication study's sample could help understand the degree to which
differences in effects between samples can be explained by individual
differences in SES or mood. This way it is possible to test if the differences
identied in Question 24 impact the replication result (Question 25).
Ingredient #3: Having high statistical power
It is crucial that a planned replication has sufcient statistical power,
allowing a strong chance to conrm as signicant the effect size from
the original publication (see Simonsohn, 2013).
Underpowered repli-
cation attempts may incorrectly suggest original effects are false posi-
tives, impeding genuine scientic progress. Some authors have
recommended that a sufcient amount of statistical power is at least
.80 (Cohen, 1992) up to .95 (Open Science Collaboration, 2012). Because
effect sizes in the published literature are likely to be overestimates of
the true effect size (Greenwald, 1975), researchers should err conserva-
tively, toward higher levels of power.
Power calculations are one potential rationale for determining sam-
ple size in the replication attempt (Questions 10 & 11).
Calculating the
power for a close replication studycan be very straightforward for some
study designs (e.g., a t-test). For other study designs, power analyses can
be more complicated, and we encourage researchers to consult the ap-
propriate literature on statistical power and sample size planning when
designing replication attempts (see, e.g., Aberson, 2010; Cohen, 1992;
Faul, Erdfelder, Lang, & Buchner, 2007; Maxwell, Kelley, & Rausch,
2008; Scherbaum & Ferreter, 2009; Shieh, 2009; Zhang & Wang, 2009
for useful information on power analysis). It has also been suggested
that an alternative for determining sample sizes is to take 2.5 times
the original sample size (Simonsohn, 2013).
Ingredient #4: Making complete details about the replication available
Close replication attempts may be seen as a thorny issue; openness in
the replication process can help ameliorate this issue. As a rule, in order to
evaluate close replication attempts as well as possible, complete details
about the methods, analyses, and outcomes of a replication should be
available to reviewers, editors, and ultimately to the readers of the
resulting article. One way to achieve this is to pre-register replication
attempts (Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit,
2012; for a pre-registration example see LeBel & Campbell, in press),
including the methods of the replication study (Questions 1016, 25),
differences between the original and replication study (Questions 17
24), and the planned analysis and evaluation of the replication attempt
(Questions 2628). Following the completion of the replication attempt,
the data, analysis syntax, and all analyses should be made available so
that the replication attempt can be fully evaluated and alternative expla-
nations for any effects can be explored (Questions 34 & 35).
and conducting replications with as much openness as ethically possible
inoculates against post hoc adjustment of replication success criteria, pro-
vides more transparency when readers evaluate the replication, gives
people less reason to suspect ulterior motives of the replicator, and
makes it more difcult to exercise liberty in choosing an analytic method
to exploit the chances of declaring the ndings in favor of (or against) the
hypothesis (Simmons, Nelson, & Simonsohn, 2011; Wagenmakers et al.,
2012). The information we recommend sharing, including the replication
pre-registration and data, can be accomplished with the Open Science
Framework (
Ingredient #5: Evaluating replication results and comparing them critically
to the results of the original study
Replication studies are not studies in isolation and so the statistical
results need to be critically compared to the results of the original
study. The meaning of this comparison needsto be carefully considered
in the discussion section of a replication article. It is not enough to deliv-
er a judgment of successful/failed replicationdepending solely on
whether or not the replication study yields a signicant result. Replica-
tion effect size estimates (Question 30) and condence intervals (when
possible, Question 31) need to be calculated and the effect size estimate
should be statisticallycompared to the original effect size (Question 32).
Evaluating the replication should involve reporting two tests: 1) the
size, direction and condence interval of the effect, which tell us wheth-
er the replication effect is signicantly different from the null; 2) an
additional test of whether it is signicantly different from the original
effect. This helps determine whether the replication was a success
(different from the null, and similar to or larger than the original and
in the same direction), an informative failure to replicate (either not
different from null, or in the opposite direction from the original, and
signicantly different from original), a practical failure to replicate
(both signicantly different from the null and from the original), or
inconclusive (neither signicantly different from null nor the original)
(Question 33; for the criteria for these decisions see Simonsohn,
2013; for additional discussion about evaluating replication results
see Asendorpf et al., 2013; Valentine et al., 2011). It may also be gen-
erally informative for any replication report to produce a meta-
analytic aggregation of the replication study's effect with the original
and with any other close replications existing in the literature.
It is
important that a discussion of replication results and their conclu-
sions take into account the limitations of the replication attempt
and the original study and possibilities of Type I and Type II errors
and random variation in the true size of the effect from study to
study (e.g., Borenstein, Hedges, & Rothstein, 2007; Question 36). In
evaluating the replication results, one should carefully consider the
total weight of the evidence bearing on an effect.
One testable consideration for explaining differences in the results of
a replication study and an original study are the many features of the
study context that could inuence the outcomes of a replication
attempt. Some of these contextual variations are due to specic theoret-
ical considerations. These may be as obvious as SES or religiosity in a
sample, but may also be as basic and nonobvious as variations in room
temperature (cf. IJzerman & Semin, 2009). In other cases, there may
be methodological considerations, which may mean the manipulation
or the measurement of the dependent variable is less accurate, such as
when changing the type of computer monitor (e.g., CRT vs. LCD; Plant
& Turner, 2009) or input device used (e.g., keyboard vs. response button
box; Li, Liang, Kleiner, & Lu, 2010). For example, it is quite possible that
the same stimulus presentation times using computer monitors of
different brands or even the same brand but with different settings
will be subliminal in one case, butsupraliminal in another. Therefore, di-
rectly adopting the programming code used in the original study will
not necessarily be enough to replicate the experience of the stimuli by
the participants in the original study.
To be clear, these possible vari-
ations should not be used defensively as untested post-hoc justications
for why an effect failed to replicate. Rather, our suggestion is that
researchers should carefully consider and test whether a speciccon-
textual feature actually does systematically and reliably affect some spe-
cic results and whether this feature was the critical feature affecting
the discrepancy in results beween the original and the replication study.
By conducting several replications of the same phenomenon in mul-
tiple labs it may be possible to identify the differences between studies
that affect the effect size, and design follow-up studies to conrm their
inuence. Multiple replication attempts have the added bonus of more
accurately estimating the effect size. The accumulation of studies helps
rmly establish an effect, accurately estimate its size, and acquire
knowledge about the factors that inuence its presence and strength.
This accumulation might take the form of multiple demonstrations of
the effect inthe original empirical paper, as well as in subsequent repli-
cation studies.
The Replication Recipe can be implemented formally by completing
the 36 Questions in Table 1 and using this information when pre-
registering and reporting the replication attempt. To facilitate the
formal use of the Replication Recipe we have integrated Table 1 into
the Open Science Framework as a replication template (see Fig. 1).
Researchers can choose the Replication Recipe Pre-Registration tem-
plate and then complete the questions in Table 1 that should be com-
pleted when pre-registering the study. This information is then saved
with the read-only time-stamped pre-registration le on the Open
Science Framework and a link to this pre-registration can be included
in a submitted paper. When researchers have completed the replication
attempt, they can choose a Replication Recipe Post-Completionregistra-
tion and then complete the remaining questions in Table 1 (see Fig. 2).
Again, researchers can include a link to this information in their submit-
ted paper. This will help standardize the registration and reporting of
replication attempts across lab groups and help consolidate the infor-
mation available about a replication study.
Limitations of the Replication Recipe
There are several limitations to the Replication Recipe. First, it is not
always feasible to collaborate with the original author on a replication
study. Much of the Recipe is easier to accomplish with the help of a
cooperative original author, and we encourage these types ofcollabora-
tions. However, we are aware that there are times when the replicator
and the original author may have principled disagreements or it is not
possible to work with the original author. When collaboration with
the original author is not feasible, the replicator should design and
conduct the study under the assumption that the original study was
conducted in the best way possible. Therefore, while we encourage
both replicators and original authors to seek a cooperative and even
collaborative relationship, when this does not occur replication studies
should still move forward.
Second, some readers will note that the Replication Recipe has more
stringent criteria than original studies and may object that if it was
good enough for the original, it is good enough for the replication.
We believe that this reasoning highlights some of the broader method-
ological problems in science and is not a limitation of the Replication
Recipe, but rather of the modal research practices in some areas of
research (LeBel & Peters, 2011; Murayama, Pekrun, & Fiedler, in press;
Simmons et al., 2011; SPSP Task Force on Research & Publication Prac-
tices, in press). Original studies would also benet from following
many of the ingredients of the Replication Recipe. For example, just as
replication studies may be affected by highly speciccontexts,original
results may also simply be due to the specic contexts in which they
were originally tested. Consequently, keeping track of our precise
methods will help researchers more efciently identify the speciccon-
ditions necessary (or not) for effects to occur. A simple implication is
that, both for replication and original studies, (more) modesty is called
for in drawing conclusions from results.
Third, the very notion of single replication attemptsmay unin-
tentionally prime people with a competitive, score-keeping men-
tality (e.g., 2 failures vs. 1 success) rather than taking a broader
meta-analytic point-of-view on any given phenomenon. The Repli-
cation Recipe is not intended to aid score keeping in the game of
science, but rather to enable replications that serve as building
blocks of a cumulative science. Our intention is that the Replication
Recipe helps the abstract scientic goal of getting it right(cf. Nosek
et al., 2012) and is why we advocate conducting multiple close rep-
lications of important ndings rather than relyingon a single original
Fourth, successful close replications may aid in solidifying a particu-
lar nding in the literature; however, a close replication study does not
address any potential theoretical limitations or confounds in the origi-
nal study design that cloud the inferences that may be drawn from it.
If the original study was plagued by confounds or bad methods, then
the replication study will similarly be plagued by the same limitations
(Tsang & Kwan, 1999).
Beyond close replications, conceptual replica-
tions, or close replication and extension designs, can be used to remove
confounds and extend the generalizability of a proposed psychological
process (Bonett, 2012; Schmidt, 2009). When focusing on a theoretical
prediction rather than effects within a given paradigm, a combination
of close and conceptual replications is the best way to build condence
in a result.
Fifth, a replication failure does not necessarily mean that the original
nding is incorrect or fraudulent. Science is complex, and we are work-
ing in the arena of probabilities meaning that some unsuccessful replica-
tions are expected. It is this very complexity that leads us to suggest that
researchers keep careful track of the differences between original and
replication studies, so as to identify and rigorously test factors that
drive a particular effect. Indeed, just as moderators that turn onor
turn offan effect are invaluable for understanding the underlying psy-
chological processes, unsuccessful replications can also be keys to
unlocking the underlying psychological processes of an effect.
It is clear that replications are a crucial component of cumulative
science because they help establish the veracity of an effect and aid
in precisely estimating its effect size. Simply stated, well-constructed
replications rene our conceptions of human behavior and thought.
Our Replication Recipe serves to guide researchers who are planning
and conducting convincing close replications, with the answers to our
36 questions serving as a basis for the replication study. We have rec-
ommended that researchers faithfully recreate the original study;
keep track of differences between the replication and original study;
check the study's assumptions in new contexts; adopt high powered
replication studies; pre-register replication materials and methods;
and evaluate and report the results as openly as ethically possible and
in accordance with the ethical guidelines of the eld. We have
There is somequestion as to whether it is appropriate to make obvious improvements
to the original study, such as using a new and improved version of a scale, when
conducting a close replication. We suspectthat it would be better if thereplication, in con-
sultation with the original authors, used improved methodsand outlined thereasoning for
doing so. Running at least two replications will provide the most information: one that
uses the original methodology (e.g., the old measure) and one that uses the improved
methodology (e.g., the new measure). A second option is to include the change in the
study as a randomized experimental factor so that participants are randomly assigned to
completethe study with the original or the improved methodology. Thesesolutions would
help clarify whether the original material had caused the effect (or its absence).
Fig. 1. Choosing the Replication Recipe as a replication template on the Open Science Framework.
222 M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
suggested that researchers measure potential moderators in a way that
does not interfere with the original study, to help determine the
reason for potential differences between the original and replication
study, which in turn helps build theory beyond merereplication.
By conducting high-powered replication studies of important ndings
we can build a cumulative science. With our Replication Recipe,
we hope to encourage more researchers to conduct convincing
replications that contribute to theoretical development, conrmation,
and disconrmation.
G*Power (Erdfelder, Faul, & Buchner, 1996) was designed as a general stand-alone power analysis program for statistical tests commonly used in social and behavioral research. G*Power 3 is a major extension of, and improvement over, the previous versions. It runs on widely used computer platforms (i.e., Windows XP, Windows Vista, and Mac OS X 10.4) and covers many different statistical tests of the t, F, and chi2 test families. In addition, it includes power analyses for z tests and some exact tests. G*Power 3 provides improved effect size calculators and graphic options, supports both distribution-based and design-based input modes, and offers all types of power analyses in which users might be interested. Like its predecessors, G*Power 3 is free.
In this article, the Society for Personality and Social Psychology (SPSP) Task Force on Publication and Research Practices offers a brief statistical primer and recommendations for improving the dependability of research. Recommendations for research practice include (a) describing and addressing the choice of N (sample size) and consequent issues of statistical power, (b) reporting effect sizes and 95% confidence intervals (CIs), (c) avoiding "questionable research practices" that can inflate the probability of Type I error, (d) making available research materials necessary to replicate reported results, (e) adhering to SPSP's data sharing policy, (f) encouraging publication of high-quality replication studies, and (g) maintaining flexibility and openness to alternative standards and methods. Recommendations for educational practice include (a) encouraging a culture of "getting it right," (b) teaching and encouraging transparency of data reporting, (c) improving methodological instruction, and (d) modeling sound science and supporting junior researchers who seek to "get it right."