ArticlePDF Available

The Replication Recipe: What Makes for a Convincing Replication?


Abstract and Figures

Psychological scientists have recently started to reconsider the importance of close replications in building a cumulative knowledge base; however, there is no consensus about what constitutes a convincing close replication study. To facilitate convincing close replication attempts we have developed a Replication Recipe, outlining standard criteria for a convincing close replication. Our Replication Recipe can be used by researchers, teachers, and students to conduct meaningful replication studies and integrate replications into their scholarly habits.
Content may be subject to copyright.
The Replication Recipe: What makes for a convincing replication?
Mark J. Brandt
, Hans IJzerman
, Ap Dijksterhuis
, Jason Geller
Roger Giner-Sorolla
, James A. Grange
, Marco Perugini
, Jeffrey R. Spies
, Anna van 't Veer
Tilburg University, Netherlands
Radboud University Nijmegen, Netherlands
University of Washington, USA
Iowa State University, USA
University of Kent, UK
Keele University, UK
University of Milano-Bicocca, Italy
Center for Open Science, USA
TIBER (Tilburg Institute of Behavioral Economics), Netherlands
Close replications are an important part of cumulative science.
Yet, little agreement exists about what makes a replication convincing.
We develop a Replication Recipe to facilitate close replication attempts.
This includes the faithful recreation of a study with high statistical power.
We discuss evaluating replication results and limitations of replications.
abstractarticle info
Article history:
Received 10 July 2013
Revised 12 October 2013
Available online 23 October 2013
Statistical power
Research method
Solid Science
Psychological scientists have recently started to reconsiderthe importance of close replications in building a cu-
mulative knowledge base; however, there is no consensus about what constitutes a convincing close replication
study. To facilitate convincing close replication attempts we have developed a Replication Recipe, outlining stan-
dard criteria for a convincing close replication. Our Replication Recipe can be used by researchers, teachers, and
students to conduct meaningful replication studies and integrate replications into their scholarly habits.
© 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Replicability in research is an important component of cumulative
science (Asendorpf et al., 2013; Jasny, Chin, Chong, & Vignieri, 2011;
Nosek, Spies, & Motyl, 2012; Rosenthal, 1990; Schmidt, 2009), yet
relatively few close replication attempts are reported in psychology
(Makel, Plucker, & Hegarty, 2012). Only recently have researchers
systematically reported replications online (e.g.,, and experimented with special issues to
incorporate replications into academic publications (e.g., Nosek &
Lakens, 2013; Zwaan & Zeelenberg, 2013). Moreover, some prestigious
psychology journals (e.g., Journal of Experimental Social Psychology,
Journal of Personality and Social Psychology,Psychological Science) are re-
cently willing to publish both failed and successful replication attempts
(e.g., Brandt, 2013; Chabris et al., 2012; LeBel & Campbell, in press;
Matthews, 2012; Pashler, Rohrer, & Harris, in press)andeven
devote ongoing sections to replications (see the new section in Perspec-
tives on Psychological Science,Registered replication reports, 2013).
From initial conclusions drawn from replication attempts of
important ndings in the empirical literature, it is clear that replica-
tion studies can be quite controversial. For example, the failure of re-
cent attempts to replicate social primingeffects (e.g., Doyen, Klein,
Pichon, & Cleeremans, 2012; Pashler et al., in press) has prompted
Journal of Experimental Social Psychology 50 (2014) 217224
This is an open-access article distributed under the terms of the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in
any medium, provided the original author and source are credited.
Corresponding author.
E-mail address: (M.J. Brandt).
First two authors share rst authorship.
All other authors share second authorship.
0022-1031/$ see front matter © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Contents lists available at ScienceDirect
Journal of Experimental Social Psychology
journal homepage:
psychologists and science journalists to raise questions about the en-
tire phenomenon (e.g., Bartlett, 2013). Failed replications have
sometimes been interpreted as 1) casting doubt on the veracity of
an entire subeld (e.g., candidate gene studies for general intelli-
gence, Chabris et al., 2012); 2) suggesting that an important compo-
nent of a popular theory is potentially incorrect (e.g., the status-
legitimacy hypothesis of System Justication Theory, Brandt, 2013); or
3) suggesting that a new nding is less robust than when rst intro-
duced (e.g., incidental values affecting judgments of time; Matthews,
2012). Of course, there are other valid reasons for replication failures:
Chance, misinterpretation of methods, and so forth.
Nevertheless, not all replication attempts reported so far have been
unsuccessful. Burger (2009) successfully replicated Milgram's famous
obedience experiments (e.g., Milgram, 1963), suggesting that when
well-conducted replications are successful they can provide us with
greater condence about the veracity of the predicted effect. Moreover,
replication attempts help estimate the effect size of a particular effect
and can serve as a starting point for replicationextension studies that
further illuminate the psychological processes that underlie an effect
and that can help to identify its boundary conditions (e.g., Lakens,
2012; Proctor & Chen, 2012). Replications are therefore essential for
theoretical development through conrmation and disconrmation of
results. Yet there seems to be little agreement as to what constitutes an
appropriate or convincing replication, what we should infer from replica-
tion failuresor successes,and what close replications mean for psy-
chological theories (see e.g., the commentary by Dijksterhuis, 2013 and
the reply by Shanks & Newell, 2013). In this paper, we provide our Rep-
lication Recipe for conducting and evaluating close replication attempts.
Close replication attempts
In general, how can one dene close replication attempts? The most
concrete goals are to test the assumed underlying theoretical process,
assess the average effect size of an effect, and test the robustness of an
effect outside of the lab of the original researchers by recreating the
methods of a study as faithfully as possible. This information helps
psychology build a cumulative knowledge base. This not only aids the
construction of new, but also the renement of old, psychological theo-
ries. In the denition of our ReplicationRecipe, close replications refer to
those replications that are based onmethods and procedures as close as
possible to the original study. We use the term close replications
because it highlights that no replications in psychology can be absolute-
ly director exactrecreations of the original study (for the basis of
this claim see Rosenthal, 1991; Tsang & Kwan, 1999). By denition
then, close replication studies aim to recreate a study as closely as
possible, so that ideally the only differences between the two are the
inevitable ones (e.g., different participants; for more on the benets of
close replications see e.g., Schmidt, 2009; Tsang & Kwan, 1999).
The Replication Recipe
What constitutes a convincing close replication attempt, and how
does one evaluate such anattempt? This is what the Replication Recipe
seeks to address. The Replication Recipe is informed by the goals of a
close replication attempt: Accurately replicating methods and estimat-
ing effect sizes and evaluating the robustness of the effect outside the
lab of origin. Our discussion is based on a synthesis of our own trials
and errors in conducting replications and guidelines recently developed
for special issues and sections of psychology journals (Nosek & Lakens,
2013; Open Science Collaboration, 2012; Registered replication
reports, 2013; Zwaan & Zeelenberg, 2013). In this synthesis, we make
explicit theexpectations and necessary qualities of a convincingreplica-
tion that can be used by researchers, teachers, and students when de-
signing and carrying out replication studies.
A convincing close replication par excellence is executed rigorously
by independent researchers or labs and includes the following ve addi-
tional ingredients:
1. Carefully dening the effects and methods that the researcher in-
tends to replicate;
2. Following as exactly as possible the methods of the original study
(including participant recruitment, instructions, stimuli, measures,
procedures, and analyses);
3. Having high statistical power;
4. Making complete details about the replication available, so that
interested experts can fully evaluate the replication attempt (or
attempt another replication themselves);
5. Evaluating replication results, and comparing them critically to the
results of the original study.
Each of these criteria is described and justied below. We present
and explain 36 questions that need to be addressed in a solid replication
(see Table 1
). This list of questions can be used as a checklist to guide
the planning and communication of a study and will help readers and
reviewers to evaluate the replication, by understanding the decisions
that a replicator has made when designing, conducting, and reporting
their replication. These questions are intended to help replicators follow
the Replication Recipe and determine when and why they have deviat-
ed from the ve Replication Recipe ingredients.
Ingredient #1: Carefully dening the effects and methods that the re-
searcher intends to replicate
Prior to conducting a replication study, researchers need to carefully
consider the precise effect they intend to replicate (Questions 19),
including the size of the original effect (Question 3), the effect size's
condence intervals (Question 4) and the methods used to uncover it
(Questions 59). Although this can be a straightforward task, in many
studies the effect of interest may be a specic aspect of a more compli-
cated set of results. For example, in a 2 × 2 design where the original
effect was a complete cross-over interaction, such that an effect was
positive in one condition and negative in the other, the effect of interest
may be the interaction, the positive and negative simple effects, or per-
haps just one of the simple effects. On other occasions, the information
about the methods used to obtain the effect will be unclear (e.g., the
original country the study was completed in, Question 7); in these
cases, it may be necessary to ask the original authors to provide the
missing information or to make an informed guess. It is important to
know the precise effect of interest from the beginning of the design-
phase of the replication because it determines nearly all of the decisions
that follow. A related consideration, especially when resources are
limited, is the importance and necessity of replicating a particular effect
(Question 2). Such decisions to replicate or not should be based on
either the effect's theoretical importance to a particulareld or its direct
or indirectvalue to society. Another consideration is existing condence
in the reliability of the effect; an effect with a number of existing close
replications in the literature may be less urgent to replicate than one
without any such support (see discussion of the Replication value
project, 20122013). In other words, not every study is worth replicat-
ing. By considering the theoretical and practical importance of a nding
the best allocation of resources can be made.
Ingredient #2: Following exactly the methods of the original study
Once a study has been chosen for replication, and the precise effect
of interest has been identied, the design of the replication study can
commence. In an ideal world, the methods of the original study
Also available as a pre-registration form on
218 M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
(including participant recruitment, instructions, stimuli, measures, pro-
cedures, and analyses) will be followed exactly; however, our prefer-
ence for the term close replicationreects the fact that this
ingredient is impossible to achieve perfectly, given the inevitable tem-
poral and geographical differences in the participants available to an in-
dependent lab (for a similar point see Rosenthal, 1991; Tsang & Kwan,
Nonetheless, the ideal of an exactreplication should be the
starting point of all close replication attempts and deviations from an
exact replication of the original study should be minimized (Questions
1014), documented, and justied (Questions 1725). Below we make
recommendations for how to best achieve this goal and what can be
done when roadblocks emerge.
To facilitate Ingredient #2 of the replication, researchers should start
with contacting the original authors of the study to try and obtain the
original materials (Question 10). If the original authors are not
cooperative or if they are unavailable (e.g., have left academia and can-
not be contacted, or if they have passed away), the necessary methods
should be recreated to the best of the replicator researchers' ability,
based on the methods section of the original article and under the as-
sumption that the original authors conducted a highly rigorous study.
For example, if replication authors are unable to obtain the reaction
time windows or stimuli used in a lexical decision task, they should fol-
low the methods of the original article as closely as possible and to ll in
the gaps by adopting best practices from research on lexical decision
tasks. In these cases, the replication researchers should then also seek
the opinionof expert colleagues in therelevant area to providefeedback
as to whether the replication study accurately recreates the original
article's study as described.
In other cases, the original materials may not be relevant for the rep-
lication study. For example, studies about Occupy Wall Street protests,
the World Series in baseball, or other historically- and culturally-
bound events are not easily closely replicated in different times and
places. In these cases the original materials should be modied to try
and capture the same psychological situation as the original experiment
(e.g., replicate the 2012 elections with the 2016 elections, or present
Except, perhaps, whenthe data of a single experiment are randomly divided into two
equal parts
Table 1
A 36-question guide to the Replication Recipe.
The Nature of the Effect
1. Verbal description of the effect I am trying to replicate:
2. It is important to replicate this effect because:
3. The effect size of the effect I am trying to replicate is:
4. The condence interval of the original effect is:
5. The sample size of the original effect is:
6. Where was the original study conducted? (e.g., lab, in the eld, online)
7. What country/region was the original study conducted in?
8. What kind of sample did the original study use?(e.g., student, Mturk, representative)
9. Was the original study conducted with paper-and-pencil surveys, on a computer, or something else?
Designing the Replication Study
10. Are the original materials for the study available from the author?
a. If not, are the original materials for the study available elsewhere (e.g., previously published scales)?
b. If the original materials are not available from the author or elsewhere, how were the materials created
for the replication attempt?
11. I know that assumptions (e.g., about the meaning of the stimuli)in the original study
will also hold in my replication because:
12. Location of the experimenter during data collection:
13. Experimenter knowledge of participant experimental condition:
14. Experimenter knowledge of overall hypotheses:
15. My target sample size is:
16. The rationale for my sample size is:
Documenting Differences between the Original and Replication Study
For each part of the study indicate whether the replication study is Exact, Close, or Conceptually Different
compared to the original study. Then, justify the rating.
17. The similarities/differences in the instructions are: [Exact | Close | Different]
18. The similarities/differences in the measures are: [Exact | Close | Different]
19. The similarities/differences in the stimuli are: [Exact | Close | Different]
20. The similarities/differences in the procedure are: [Exact | Close | Different]
21. The similarities/differences in the location (e.g., lab vs. online; alone vs. in groups) are: [Exact | Close | Different]
22. The similarities/differences in remuneration are: [Exact | Close | Different]
23. The similarities/differences between participant populations are: [Exact | Close | Different]
24. What differences between the original study and your study might be expected to inuence the size and/or
direction of the effect?:
25. I have taken the following steps to test whether the differences listed in #24 will inuence the outcome of my
replication attempt:
Analysis and Replication Evaluation
26. My exclusion criteria are (e.g., handling outliers, removing participants from analysis):
27. My analysis plan is (justify differences from the original):
28. A successful replication is dened as:
Registering the Replication Attempt
29. The nalized materials, procedures, analysis plan etc of the replication are registered here:
Reporting the Replication
30. The effect size of the replication is:
31. The condence interval of the replication effect size is:
32. The replication effect size [is/is not] (circle one) signicantly different from the original effect size?
33. I judge the replication to be a(n) [success/informative failure to replicate/practical failure to
replicate/inconclusive] (circle one) because:
34. Interested experts can obtain my data and syntax here:
35. All of the analyses were reported in the report or are available here:
36. The limitations of my replication study are:
219M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
British participants with a cricket rather than baseball championship).
In such cases, the most valid replication attempt may actually entail
changing the stimulus materials to ensure that they are functionally
To ensure that the modied materials effectively capture
the same constructs as the original study they can (when possible) be
developed in collaboration with the original authors and the research
community can be polled for their input (via e.g., professional discus-
sion forums and e-mail lists). In some cases, depending on the severity
of the change, it will be necessary to conduct a pilot study, testing the
equivalence of manipulations and measures to constructs tested in the
original research prior to the actual replication attempt. The justica-
tions or steps taken to ensure that the assumptions about the meaning
of the stimuli hold in thereplication attempt should be clearly specied
(Question 11).
Although there is no single conclusive replication (or original study
for that matter), andnosuch burden shouldbe put on an individual rep-
lication study, the replication researcher should do his or her best to
minimize the differences between the replication and the original
study and identify what these differences are. Questions 1723 ask
replicators to categorize which parts of the study are exactly the same
as, close to, or conceptually different from the original study and to
then justify the differences. All of these are imperfect categories that
exist along a continuum, but this categorization task yields at least
three benets. First, reviewers, readers, and editors can judge for them-
selves whether or not they think that the deviation from the original
study was justied. In some cases, a deviation will be clearly justied
(e.g., using a different, but demographically similar, sample of partici-
pants), whereas in other cases it may be less clear-cut (e.g., replicating
a non-internet computer-based lab study done in cubicles on the inter-
net). Second, by identifying differences between replication andoriginal
studies (sample, culture, lab context, etc.) researchers and readers can
identify where the replication is on the continuum from closeto
conceptual.Third, after multiple replication attempts have been re-
corded, these deviations can be used to determine relevant boundary
conditions on a particular effect (for more elaboration on this point
see Greenwald, Pratkanis, Leippe, & Baumgardner, 1986; IJzerman,
Brandt, & van Wolferen, 2013).
In the process of identifying and justifying deviations from the orig-
inal study, replicators shouldanticipate differences between the original
and replication study that may inuence the size and direction of the ef-
fect and testthese possibilities (Question 24). For example, studies have
revealed that people of varying social classes have different psycholog-
ical processes related to the perception of threat, self-control, and per-
spective taking (among other things; e.g., Henry, 2009; Johnson,
Richeson, & Finkel, 2011; Kraus, Piff, Mendoza-Denton, Rheinschmidt,
& Keltner, 2012). Similarly, people process a variety of information dif-
ferently when they are in a positive or negative mood (for reviews
Forgas, 1995; Rusting, 1998). Conducting a replication at a university
(or online) drawing students from different socioeconomic strata
(SES) than the original population or in circumstances where partici-
pants tendto bein adifferent mood than the participants in the original
study (e.g., immediately prior to mid-term exams compared to the
week after exams) may affect the outcome of the replication. In this
case, an individual difference measure of SES or mood could be included
at the end of the replication study so as to not interfere with the close
replication of the original study. Then, a statistical moderator test within
the replication study's sample could help understand the degree to which
differences in effects between samples can be explained by individual
differences in SES or mood. This way it is possible to test if the differences
identied in Question 24 impact the replication result (Question 25).
Ingredient #3: Having high statistical power
It is crucial that a planned replication has sufcient statistical power,
allowing a strong chance to conrm as signicant the effect size from
the original publication (see Simonsohn, 2013).
Underpowered repli-
cation attempts may incorrectly suggest original effects are false posi-
tives, impeding genuine scientic progress. Some authors have
recommended that a sufcient amount of statistical power is at least
.80 (Cohen, 1992) up to .95 (Open Science Collaboration, 2012). Because
effect sizes in the published literature are likely to be overestimates of
the true effect size (Greenwald, 1975), researchers should err conserva-
tively, toward higher levels of power.
Power calculations are one potential rationale for determining sam-
ple size in the replication attempt (Questions 10 & 11).
Calculating the
power for a close replication studycan be very straightforward for some
study designs (e.g., a t-test). For other study designs, power analyses can
be more complicated, and we encourage researchers to consult the ap-
propriate literature on statistical power and sample size planning when
designing replication attempts (see, e.g., Aberson, 2010; Cohen, 1992;
Faul, Erdfelder, Lang, & Buchner, 2007; Maxwell, Kelley, & Rausch,
2008; Scherbaum & Ferreter, 2009; Shieh, 2009; Zhang & Wang, 2009
for useful information on power analysis). It has also been suggested
that an alternative for determining sample sizes is to take 2.5 times
the original sample size (Simonsohn, 2013).
Ingredient #4: Making complete details about the replication available
Close replication attempts may be seen as a thorny issue; openness in
the replication process can help ameliorate this issue. As a rule, in order to
evaluate close replication attempts as well as possible, complete details
about the methods, analyses, and outcomes of a replication should be
available to reviewers, editors, and ultimately to the readers of the
resulting article. One way to achieve this is to pre-register replication
attempts (Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit,
2012; for a pre-registration example see LeBel & Campbell, in press),
including the methods of the replication study (Questions 1016, 25),
differences between the original and replication study (Questions 17
24), and the planned analysis and evaluation of the replication attempt
(Questions 2628). Following the completion of the replication attempt,
the data, analysis syntax, and all analyses should be made available so
that the replication attempt can be fully evaluated and alternative expla-
nations for any effects can be explored (Questions 34 & 35).
and conducting replications with as much openness as ethically possible
inoculates against post hoc adjustment of replication success criteria, pro-
vides more transparency when readers evaluate the replication, gives
people less reason to suspect ulterior motives of the replicator, and
makes it more difcult to exercise liberty in choosing an analytic method
to exploit the chances of declaring the ndings in favor of (or against) the
hypothesis (Simmons, Nelson, & Simonsohn, 2011; Wagenmakers et al.,
2012). The information we recommend sharing, including the replication
pre-registration and data, can be accomplished with the Open Science
Framework (
To be sure, replicationsin this type of situationare less close than whatis often meant
by close replications and some people will consider these replications conceptual
When attempting to replicate a study that has already been the subjectof several rep-
licationattempts it is desirable to base the replicationpower calculationsand sample sizes
on the average meta-analytic effect size.
The high power necessary for a convincing close replication can provide a challenge
for researchers that do not have access to very large samples. One option, though it does
not appear to be used often, is to combine resources with other labs to collect the neces-
sary number of participants (similar to initiatives developed by Perspectives on Psycholog-
ical Science,Registered replication reports, 2013).
Although,there are other defensible sample size justications (seee.g., Maxwell et al.,
Exceptions can be made on data protection grounds (e.g., when data are difcult to
anonymize, or when unable to share privileged information from companies).
220 M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
Ingredient #5: Evaluating replication results and comparing them critically
to the results of the original study
Replication studies are not studies in isolation and so the statistical
results need to be critically compared to the results of the original
study. The meaning of this comparison needsto be carefully considered
in the discussion section of a replication article. It is not enough to deliv-
er a judgment of successful/failed replicationdepending solely on
whether or not the replication study yields a signicant result. Replica-
tion effect size estimates (Question 30) and condence intervals (when
possible, Question 31) need to be calculated and the effect size estimate
should be statisticallycompared to the original effect size (Question 32).
Evaluating the replication should involve reporting two tests: 1) the
size, direction and condence interval of the effect, which tell us wheth-
er the replication effect is signicantly different from the null; 2) an
additional test of whether it is signicantly different from the original
effect. This helps determine whether the replication was a success
(different from the null, and similar to or larger than the original and
in the same direction), an informative failure to replicate (either not
different from null, or in the opposite direction from the original, and
signicantly different from original), a practical failure to replicate
(both signicantly different from the null and from the original), or
inconclusive (neither signicantly different from null nor the original)
(Question 33; for the criteria for these decisions see Simonsohn,
2013; for additional discussion about evaluating replication results
see Asendorpf et al., 2013; Valentine et al., 2011). It may also be gen-
erally informative for any replication report to produce a meta-
analytic aggregation of the replication study's effect with the original
and with any other close replications existing in the literature.
It is
important that a discussion of replication results and their conclu-
sions take into account the limitations of the replication attempt
and the original study and possibilities of Type I and Type II errors
and random variation in the true size of the effect from study to
study (e.g., Borenstein, Hedges, & Rothstein, 2007; Question 36). In
evaluating the replication results, one should carefully consider the
total weight of the evidence bearing on an effect.
One testable consideration for explaining differences in the results of
a replication study and an original study are the many features of the
study context that could inuence the outcomes of a replication
attempt. Some of these contextual variations are due to specic theoret-
ical considerations. These may be as obvious as SES or religiosity in a
sample, but may also be as basic and nonobvious as variations in room
temperature (cf. IJzerman & Semin, 2009). In other cases, there may
be methodological considerations, which may mean the manipulation
or the measurement of the dependent variable is less accurate, such as
when changing the type of computer monitor (e.g., CRT vs. LCD; Plant
& Turner, 2009) or input device used (e.g., keyboard vs. response button
box; Li, Liang, Kleiner, & Lu, 2010). For example, it is quite possible that
the same stimulus presentation times using computer monitors of
different brands or even the same brand but with different settings
will be subliminal in one case, butsupraliminal in another. Therefore, di-
rectly adopting the programming code used in the original study will
not necessarily be enough to replicate the experience of the stimuli by
the participants in the original study.
To be clear, these possible vari-
ations should not be used defensively as untested post-hoc justications
for why an effect failed to replicate. Rather, our suggestion is that
researchers should carefully consider and test whether a speciccon-
textual feature actually does systematically and reliably affect some spe-
cic results and whether this feature was the critical feature affecting
the discrepancy in results beween the original and the replication study.
By conducting several replications of the same phenomenon in mul-
tiple labs it may be possible to identify the differences between studies
that affect the effect size, and design follow-up studies to conrm their
inuence. Multiple replication attempts have the added bonus of more
accurately estimating the effect size. The accumulation of studies helps
rmly establish an effect, accurately estimate its size, and acquire
knowledge about the factors that inuence its presence and strength.
This accumulation might take the form of multiple demonstrations of
the effect inthe original empirical paper, as well as in subsequent repli-
cation studies.
The Replication Recipe can be implemented formally by completing
the 36 Questions in Table 1 and using this information when pre-
registering and reporting the replication attempt. To facilitate the
formal use of the Replication Recipe we have integrated Table 1 into
the Open Science Framework as a replication template (see Fig. 1).
Researchers can choose the Replication Recipe Pre-Registration tem-
plate and then complete the questions in Table 1 that should be com-
pleted when pre-registering the study. This information is then saved
with the read-only time-stamped pre-registration le on the Open
Science Framework and a link to this pre-registration can be included
in a submitted paper. When researchers have completed the replication
attempt, they can choose a Replication Recipe Post-Completionregistra-
tion and then complete the remaining questions in Table 1 (see Fig. 2).
Again, researchers can include a link to this information in their submit-
ted paper. This will help standardize the registration and reporting of
replication attempts across lab groups and help consolidate the infor-
mation available about a replication study.
Limitations of the Replication Recipe
There are several limitations to the Replication Recipe. First, it is not
always feasible to collaborate with the original author on a replication
study. Much of the Recipe is easier to accomplish with the help of a
cooperative original author, and we encourage these types ofcollabora-
tions. However, we are aware that there are times when the replicator
and the original author may have principled disagreements or it is not
possible to work with the original author. When collaboration with
the original author is not feasible, the replicator should design and
conduct the study under the assumption that the original study was
conducted in the best way possible. Therefore, while we encourage
both replicators and original authors to seek a cooperative and even
collaborative relationship, when this does not occur replication studies
should still move forward.
Second, some readers will note that the Replication Recipe has more
stringent criteria than original studies and may object that if it was
good enough for the original, it is good enough for the replication.
We believe that this reasoning highlights some of the broader method-
ological problems in science and is not a limitation of the Replication
Recipe, but rather of the modal research practices in some areas of
research (LeBel & Peters, 2011; Murayama, Pekrun, & Fiedler, in press;
Simmons et al., 2011; SPSP Task Force on Research & Publication Prac-
tices, in press). Original studies would also benet from following
many of the ingredients of the Replication Recipe. For example, just as
replication studies may be affected by highly speciccontexts,original
results may also simply be due to the specic contexts in which they
were originally tested. Consequently, keeping track of our precise
methods will help researchers more efciently identify the speciccon-
ditions necessary (or not) for effects to occur. A simple implication is
Note thatin a meta-analytic approach the overall effectsize would almost certainly be
affectedmore by a high-powered replication than theoriginal study (assuming it had less
statistical power).Under these conditions, the somewhatsurprising conclusion is that one
should trust the results of the higher-powered replication more than a lower-powered
original study, assuming the replication is of high quality and there are no meaningful
moderators of the differences between the original and replication study. A status quo in
which most original studies reach equally high power levels would eliminate this
This example was adapted from a talk by Dominique Muller given at the 2013
European Social Cognition Network meeting.
221M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
that, both for replication and original studies, (more) modesty is called
for in drawing conclusions from results.
Third, the very notion of single replication attemptsmay unin-
tentionally prime people with a competitive, score-keeping men-
tality (e.g., 2 failures vs. 1 success) rather than taking a broader
meta-analytic point-of-view on any given phenomenon. The Repli-
cation Recipe is not intended to aid score keeping in the game of
science, but rather to enable replications that serve as building
blocks of a cumulative science. Our intention is that the Replication
Recipe helps the abstract scientic goal of getting it right(cf. Nosek
et al., 2012) and is why we advocate conducting multiple close rep-
lications of important ndings rather than relyingon a single original
Fourth, successful close replications may aid in solidifying a particu-
lar nding in the literature; however, a close replication study does not
address any potential theoretical limitations or confounds in the origi-
nal study design that cloud the inferences that may be drawn from it.
If the original study was plagued by confounds or bad methods, then
the replication study will similarly be plagued by the same limitations
(Tsang & Kwan, 1999).
Beyond close replications, conceptual replica-
tions, or close replication and extension designs, can be used to remove
confounds and extend the generalizability of a proposed psychological
process (Bonett, 2012; Schmidt, 2009). When focusing on a theoretical
prediction rather than effects within a given paradigm, a combination
of close and conceptual replications is the best way to build condence
in a result.
Fifth, a replication failure does not necessarily mean that the original
nding is incorrect or fraudulent. Science is complex, and we are work-
ing in the arena of probabilities meaning that some unsuccessful replica-
tions are expected. It is this very complexity that leads us to suggest that
researchers keep careful track of the differences between original and
replication studies, so as to identify and rigorously test factors that
drive a particular effect. Indeed, just as moderators that turn onor
turn offan effect are invaluable for understanding the underlying psy-
chological processes, unsuccessful replications can also be keys to
unlocking the underlying psychological processes of an effect.
It is clear that replications are a crucial component of cumulative
science because they help establish the veracity of an effect and aid
in precisely estimating its effect size. Simply stated, well-constructed
replications rene our conceptions of human behavior and thought.
Our Replication Recipe serves to guide researchers who are planning
and conducting convincing close replications, with the answers to our
36 questions serving as a basis for the replication study. We have rec-
ommended that researchers faithfully recreate the original study;
keep track of differences between the replication and original study;
check the study's assumptions in new contexts; adopt high powered
replication studies; pre-register replication materials and methods;
and evaluate and report the results as openly as ethically possible and
in accordance with the ethical guidelines of the eld. We have
There is somequestion as to whether it is appropriate to make obvious improvements
to the original study, such as using a new and improved version of a scale, when
conducting a close replication. We suspectthat it would be better if thereplication, in con-
sultation with the original authors, used improved methodsand outlined thereasoning for
doing so. Running at least two replications will provide the most information: one that
uses the original methodology (e.g., the old measure) and one that uses the improved
methodology (e.g., the new measure). A second option is to include the change in the
study as a randomized experimental factor so that participants are randomly assigned to
completethe study with the original or the improved methodology. Thesesolutions would
help clarify whether the original material had caused the effect (or its absence).
Fig. 1. Choosing the Replication Recipe as a replication template on the Open Science Framework.
222 M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
suggested that researchers measure potential moderators in a way that
does not interfere with the original study, to help determine the
reason for potential differences between the original and replication
study, which in turn helps build theory beyond merereplication.
By conducting high-powered replication studies of important ndings
we can build a cumulative science. With our Replication Recipe,
we hope to encourage more researchers to conduct convincing
replications that contribute to theoretical development, conrmation,
and disconrmation.
Aberson, C. L. (2010). Applied power analysis for the behavioral sciences. New York:
Asendorpt, J. B., Conner, M., De Fruyt, F., De Houwer, J., Denissen, J. J. A., Fiedler, K., et al.
(2013). Recommendations for increasing replicability in psychology. European Jour-
nal of Personality,27,108119.
Bartlett, T. (2013). Power of suggestion: The amazing inuence of unconscious cues is
among the most fascinating discoveries of our timethat is, if it's true. The
Chronicle of Higher Education (Retrieved from
Bonett, D.G. (2012). Replication-extension studies. Current Directions in Psychological
Borenstein, M., Hedges, L., & Rothstein, H. (2007). Meta analysis: Fixed effect versus ran-
dom effects. Retrieved from.
Brandt, M. J. (2013). Do the disadvantaged legitimize the socialsystem? A large-scale test
of the statuslegitimacy hypothesis. Journal of Personality and Social Psychology,104,
Burger, J. M. (2009). Replicating Milgram: Would people still obey today? American Psy-
Chabris, C. F., Hebert, B.M., Benjamin, D. J., Beauchamp, J., Cesarini, D., van der Loos, M.,
et al. (2012). Most reported genetic associations with general intelligence are proba-
bly false positives. Psychological Science,23,13141323.
Cohen, J. (1992). A power primer. Psychological Bulletin,112,155159.
Dijksterhuis, A. (2013). Replication crisis or crisis in replication? A reinterpretation of
Shanks et al. [Comment on Empirical Article e56515]. Retrieved from. http://www.
Doyen, S., Klein, O., Pichon, C. L., & Cleeremans, A. (2012). Behavioral priming: It's all in
the mind, but whose mind? PLoS ONE,7, e29081.
Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G* Power 3: A exible statistical
power analysis program for the social, behavioral, and biomedical sciences.
Behavior Research Methods,39,175191.
Forgas, J. P. (1995). Mood and judgment: The affect infusion model (AIM). Psychological
Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis.
Psychological Bulletin,82,120.
Greenwald,A. G., Pratkanis, A.R., Leippe, M. R., & Baumgardner, M. H. (1986).Under what
conditions does theory obstruct research progress? Psychological Review,93(2),
Henry, P. J. (2009). Low-status compensation: A theory for understanding the role of status
in cultures of honor. Journal of Personality and Social Psychology,97, 451466.
IJzerman, H., Brandt, M. J., & van Wolferen, J. (2013). Rejoice! In replication. European
Journal of Personality,127,128129.
IJzerman, H., & Semin, G. R. (2009). The thermometer of social relations mapping social
proximity on temperature. Psychological Science,20,12141220.
Jasny, B. R., Chin, G., Chong, L., & Vignieri, S. (2011). Again, and again, and again. Science,
Johnson, S. E., Richeson, J. A., & Finkel, E. J. (2011). Middle class and marginal? Socioeco-
nomic status, stigma, and self-regulation at an elite university. Journal of Personality
and Social Psychology,100,838852.
Kraus, M. W., Piff, P. K., Mendoza-Denton, R., Rheinschmidt, M. L., & Keltner, D. (2012).
Social class, solipsism, and contextualism: How the rich are different from the poor.
Psychological Review,119,546572.
Lakens, D. (2012). Polarity correspondence in metaphor congruency effects: Structural
overlap predicts categorization times for bipolar concepts presented in vertical
space. Journal of Experimental Psychology: Learning, Memory, and Cognition,38,
LeBel, E. P., & Campbell, L. (2013). Heightened sensitivity to temperature cues in individ-
uals with high anxious attachment: Real or elusive phenomenon? Psychological
Science,24, 21282130.
LeBel, E. P., & Peters, K. R. (2011). Fearing the future of empirical psychology: Bem's
(2011) evidence of psi as a case study of deciencies in modal research practice.
Review of General Psychology,15,371379.
Fig. 2. Example of reporting a replication with the Replication Recipe on the OpenScience Framework.
223M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
Li, X., Liang, Z., Kleiner, M., & Lu, Z. (2010). RTbox: A device for highly accurate response
time measurements. Behavior Research Methods,42,212225.
Makel, M. C., Plucker,J. A., & Hegarty, B. (2012). Replications in psychology research: How
often do they really occur? Perspectives on Psychological Science,7,537542.
Matthews, W. J. (2012). How much do incidental values affect the judgment of time?
Psychological Science,23,14321434.
Maxwell, S. E., Kelley, K., & Rausch, J. R.(2008). Sample size planning forstatistical power
and accuracy in parameter estimation. Annual Review of Psychology,59,537563.
Milgram, S. (1963). Behavioral study of obedience. Journal of Abnormal and Social
Murayama, K., Pekrun, R., & Fiedler, K. (in press). Research practices that can prevent an
ination of false-pos itive rates. Personality and Social Psychology Review (in press).
Nosek, B.A. , & Lakens, D. (2013). Call for proposals: Special issue of social psychology
on Replications of important results in social psychology.Social Psychology,44,
Nosek, B.A., Spies, J. R., & Motyl, M. (2012). Scientic utopia: II. Restructuring incentives
and practices to promote truth over publishability. Perspectives on Psychological
Open Science Collaboration. (2012). An open, large-scale, collaborative effort to estimate
the reproducibility of psychological science. Perspectives on Psychological Science,7,
Pashler, H., Rohrer, D., & Harris, C. (2013). Can the goal of honesty be primed? Journal of
Experimental Social Psychology,49,959964.
Plant, R. R., & Turner, G. (2009). Millisecond precision psychological research in a world of
commodity computers: New hardware, new problems? Behavior Research Methods,
Proctor, R. W., & Chen, J. (2012). Dissociating inuences of key and hand separation on
the Stroop color -identication effect. Acta Psychologica,141,3947.
Registered replication reports (2013). Retrieved June 5, 2013, from. http://www.
Replication value project [Electronic mailinglist discussion] (2012-2013). Retrieved from
Open Science Framework Google Group:
Rosenthal, R. (1990). How are we doing in soft psychology? American Psychologist,45,
Rosenthal, R. (1991). Replication in behavioral research.In J. W. Neuliep (Ed.), Replication
research in the social sciences (pp. 130). Newbury Park, CA: Sage.
Rusting, C. L. (1998). Personality, mood, and cognitive processing of emotional informa-
tion: Three conceptual frameworks. Psychological Bulletin,124,165196.
Scherbaum, C. A., & Ferreter, J. M. (2009). Estimating statistical power and required sam-
ple sizes for organizational research using multilevel modeling. Organizational
Research Methods,12,347367.
Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is
neglected in the social sciences. Review of General Psychology,13,90100.
Shanks, D. R., & Newell, B. R. (2013). Response to Dijksterhuis. [Comment on Empirical
Article e56515]. Retrieved May 20, 2013, from.
Shieh, G. (2009). Detecting interaction effects inmoderated multipleregression with con-
tinuous variables power and sample size considerations. Organizational Research
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undis-
closed exibility in data collection and analysis allows presenting anything as signif-
icant. Psychological Science,22,13591366.
Simonsohn, U. (2013). Evaluating replication results. Available at SSRN:
SPSP Task Force on Research and Publication Practices(in press). Improving the depend-
ability of research in personality and social psychology: Recommendations for re-
search and educational prac tice. Personality and Social Psychology Review (in press).
Tsang, E. W.,& Kwan, K. M. (1999). Replicationand theory development in organizational
science: A critical realist perspective. Academy of Management Review,24,759780.
Valentine, J. C., Biglan, A., Boruch, R. F., Castro, F. G., Collins, L. M., Flay, B. R., et al. (2011).
Replication in prevention science. Prevention Science,12,103117.
Wagenmakers, E. J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A. (2012).
An agenda for purely conrmatory research. Perspectives on Psychological Science,7,
Zhang, Z., & Wang, L. (2009). Statistical power analysis for growth curve models using
SAS. Behavior Research Methods,41,10831094.
Zwaan, R. A., & Zeelenberg, R. (2013). Replication attempts of important results in the
study of cognition. Frontiers in Cognition (Retrieved from http://www.frontiersin.
224 M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
... Although replication is most valuable when the existing understanding for a theory is weakest [5], it can also update boundaries of a claim to further develop theories [7,8]. Researchers have focused attention on how replication studies should be conducted and how their results should be interpreted [9][10][11][12]. Yet, there has been little discussion about the factors influencing study selection for replication and even less of a consensus on what those factors should be [1]. ...
... For a large-scale replication project such as that proposed here, the selection of studies to replicate should be based on the aims of the project, namely, to provide an initial estimate of the overall replicability of studies within the entire field. There is a risk of selection bias in larger projects whereby studies are selected based on the belief they are easy to replicate or that the replication study results will differ from the original study results [9,31]. To avoid such bias, once a set of studies is identified that is representative of the field, with resource availability and feasibility constraints factored in, the studies that are selected for replication should then be picked at random. ...
... The theoretical relevance of a topic or original study is important during the consideration of selecting studies to replicate [1,24,30]. When selecting studies to replicate, one should aim to investigate relevant research questions that are of current interest to the field [1,9,[23][24][25]30]. Therefore, recent research will be selected for replication, which we arbitrarily define as studies that were published in the last five years, from the date of each stage of the replication effort. ...
Full-text available
Introduction To improve the rigor of science, experimental evidence for scientific claims ideally needs to be replicated repeatedly with comparable analyses and new data to increase the collective confidence in the veracity of those claims. Large replication projects in psychology and cancer biology have evaluated the replicability of their fields but no collaborative effort has been undertaken in sports and exercise science. We propose to undertake such an effort here. As this is the first large replication project in this field, there is no agreed-upon protocol for selecting studies to replicate. Criticism of previous selection protocols include claims they were non-randomised and non-representative. Any selection protocol in sports and exercise science must be representative to provide an accurate estimate of replicability of the field. Our aim is to produce a protocol for selecting studies to replicate for inclusion in a large replication project in sports and exercise science. Methods The proposed selection protocol uses multiple inclusion and exclusion criteria for replication study selection, including: the year of publication and citation rankings, research disciplines, study types, the research question and key dependent variable, study methods and feasibility. Studies selected for replication will be stratified into pools based on instrumentation and expertise required, and will then be allocated to volunteer laboratories for replication. Replication outcomes will be assessed using a multiple inferential strategy and descriptive information will be reported regarding the final number of included and excluded studies, and original author responses to requests for raw data.
... A cross-cultural replication as near to the same methodology as possible would support this significant work as replication has been shown to be the scientific gold standard that enables the confirmation of research findings. Close replications are an important part of cumulative science (Brandt et al., 2014). ...
Full-text available
Numbers of adolescents experiencing gambling related harm are increasing. Teachers spend a significant amount of time with students and their attitudes can make an impact on engagement in high-risk behavior. However, teachers’ awareness of, and attitudes towards adolescent gambling are under-researched; this study aimed to address this gap. 157 UK schoolteachers completed an online survey assessing their perceptions of adolescent gambling. Cochran’s-Q tests of association and regression analyses revealed that teachers perceived adolescent gambling as significantly less serious than other high-risk behaviors. Teachers also reported having significantly less frequent conversations about gambling and were less confident addressing gambling issues than other high-risk behaviors. Arguments are made for increased teacher training around problematic youth gambling. Such a strategy would be a prerequisite for the development and implementation of targeted prevention from harms.
... Despite the evident scholarly and applied value of the chosen target, to the best of our knowledge, there have been no replications of the effect thus far. We followed increasing calls in the past decade for a science reform promoting the practice of revisiting published findings in psychological science and assessing their reproducibility, replicability, and generalizability (e.g., Brandt et al., 2014;Open Science Collaboration, 2015;van 't Veer & Giner-Sorolla, 2016;Zwaan et al., 2018). We focused our efforts on Study 1 as the initial most straightforward demonstration of the baseline phenomenon. ...
Full-text available
Attribute framing of environmental costs refers to the phenomenon whereby describing environmental costs in positive or negative yet objectively equivalent versions of a critical characteristic (e.g., offset versus tax) influences one’s judgment and decision-making. Specifically, environmental costs positively framed as an offset are systematically favored over equivalent costs negatively framed as a tax. In this Registered Report Stage 1, we will conduct a high-powered (N = 450) direct replication of Study 1 by Hardisty and colleagues (2010). The original study examined whether individual differences in political affiliation moderated attribute framing of environmental costs. Republicans and Independents showed greater pro-environmental decision-making outcomes when environmental costs were described as an offset as opposed to a tax, whereas Democrats were not affected by such framing. The present replication [failed to find/found] support for this effect [effect size+CIs information]. Extending the original work, the present study also [failed to find/found] support for additional interactions between political affiliation and attribute framing on self-efficacy related to climate change and perceived impact on climate change [effect size+CIs information]. Materials, data, and analysis code are available on
We scrutinize the argument that unsuccessful replications—and heterogeneous effect sizes more generally—may reflect an underappreciated influence of context characteristics. Notably, while some of these context characteristics may be conceptually irrelevant (as they merely affect psychometric properties of the measured/manipulated variables), others are conceptually relevant as they qualify a theory. Here, we present a conceptual and analytical framework that allows researchers to empirically estimate the extent to which effect size heterogeneity is due to conceptually relevant versus irrelevant context characteristics. According to this framework, contextual characteristics are conceptually relevant when the observed heterogeneity of effect sizes cannot be attributed to psychometric properties. As an illustrative example, we demonstrate that the observed heterogeneity of the “moral typecasting” effect, which had been included in the ManyLabs 2 replication project, is more likely attributable to conceptually relevant rather than irrelevant context characteristics, which suggests that the psychological theory behind this effect may need to be specified. In general, we argue that context dependency should be taken more seriously and treated more carefully by replication research.
In this chapter, I will discuss statistical considerations for studying replication. More specifically, I will approach replication from a framework based on meta-analysis. To do so, I will focus on direct replications, where studies are designed to be as similar as possible, as opposed to conceptual replications that (systematically or haphazardly) vary in at least one aspect of an experiment. The chapter starts with a brief description of recent research on replication in psychology and uses examples from that research to highlight relevant considerations in defining and parametrizing “replication.” It then outlines different ways to frame analyses of replication and provides examples. Finally, it takes one possible definition of replication—that effects found across studies involving the same phenomenon are consistent—and describes relevant analyses and their properties.KeywordsReplicabilityMeta-analysisReplication researchDirect replication
Increasing evidence indicates that many published findings in psychology may be overestimated or even false. An often-heard response to this “replication crisis” is to replicate more: replication studies should weed out false positives over time and increase the robustness of psychological science. However, replications take time and money – resources that are often scarce. In this chapter, I propose an efficient alternative strategy: a four-step robustness check that first focuses on verifying reported numbers through reanalysis before replicating studies in a new sample.KeywordsRobustness of psychological research findingsFour-step robustness checkReplication crisis
In this chapter, we document notable failed replications in psychology. According to a pioneering project conducted by the Open Science Collaboration, less than half of a sample of 100 studies successfully replicated. Since that time, other large-scale replication attempts have echoed the worrisome state of psychological science. Dubbed “the replication crisis,” this dilemma in the sciences is twofold; not only are replication studies rarely conducted but the results of original studies are often difficult to replicate. The crisis has been challenging for psychological science in many ways, but one particular quagmire it has revealed is a body of knowledge potentially fraught with Type I errors (i.e., rejection of a true null hypothesis)—a sentiment some researchers suggested before the crisis even began. The crisis has also functioned to highlight systemic biases and problematic incentive structures in our scientific enterprise, which we will discuss in greater detail later in the chapter. We conclude this chapter with the hopes that learning more about the historical context of the replication crisis helps readers participate in discourse on the subject and motivates them to be active participants in improving psychological science.KeywordsPsychological scienceQuestionable research practicesReplication failures
Full-text available
A considerable proportion of psychological research has not been replicable, and estimates range from 9% to 77% for nonreplicable results. The extent to which vast proportions of studies in the field are replicable is still unknown, as researchers lack incentives for publishing individual replication studies. When preregistering replication studies via the Open Science Foundation website (OSF,, researchers can publicly register their results without having to publish them and thus circumvent file-drawer effects. We analyzed data from 139 replication studies for which the results were publicly registered on the OSF and found that out of 62 reports that included the authors’ assessments, 23 were categorized as “informative failures to replicate” by the original authors. 24 studies allowed for comparisons between the original and replication effect sizes, and whereas 75% of the original effects were statistically significant, only 30% of the replication effects were. The replication effects were also significantly smaller than the original effects (approx. 38% the size). Replication closeness did not moderate the difference between the original and the replication effects. Our results provide a glimpse into estimating replicability for studies from a wide range of psychological fields chosen for replication by independent groups of researchers. We invite researchers to browse the Replication Database (ReD) ShinyApp, which we created to check for whether seminal studies from their respective fields have been replicated. Our data and code are available online:
This research is a pre-registered replication of Rosette, Leonardelli, and Phillips' (2008) seminal work in leadership categorization theory. Their work established race as a component to the business leader prototype and found evidence that when a leader was given credit for successful organizational performance, White leaders were evaluated more favorably than non-White leaders. As leadership exemplars are evolving, however, a need to reexamine these relationships has emerged. Results from our replications of their first and third studies showed minimal support for the argument that being White is a component of the business leader prototype. Additionally, across six separate studies, we found no conditions in which White leaders received more favorable evaluations than their non-White counterparts. Contrary to our expectations, we found that non-White leaders received marginally more favorable ratings than White leaders in four of our studies.
Full-text available
The data includes measures collected for the two experiments reported in “False-Positive Psychology” [1] where listening to a randomly assigned song made people feel younger (Study 1) or actually be younger (Study 2). These data are useful because they illustrate inflations of false positive rates due to flexibility in data collection, analysis, and reporting of results. Data are useful for educational purposes.
Full-text available
Recent controversies in psychology have spurred conversations about the nature and quality of psychological research. One topic receiving substantial attention is the role of replication in psychological science. Using the complete publication history of the 100 psychology journals with the highest 5-year impact factors, the current article provides an overview of replications in psychological research since 1900. This investigation revealed that roughly 1.6% of all psychology publications used the term replication in text. A more thorough analysis of 500 randomly selected articles revealed that only 68% of articles using the term replication were actual replications, resulting in an overall replication rate of 1.07%. Contrary to previous findings in other fields, this study found that the majority of replications in psychology journals reported similar findings to their original studies (i.e., they were successful replications). However, replications were significantly less likely to be successful when there was no overlap in authorship between the original and replicating articles. Moreover, despite numerous systemic biases, the rate at which replications are being published has increased in recent decades. © The Author(s) 2012.
Full-text available
Replication-extension studies combine results from prior studies with results from a new study specifically designed to replicate and extend the results of the prior studies. Replication-extension studies have many advantages over the traditional single-study designs used in psychology: Formal assessments of replication can be obtained, effect sizes can be estimated with greater precision and generalizability, misleading findings from prior studies can be exposed, and moderator effects can be assessed.
This practical guide on conducting power analyses using IBM SPSS was written for students and researchers with limited quantitative backgrounds. Readers will appreciate the coverage of topics that are not well described in competing books such as estimating effect sizes, power analyses for complex designs, detailed coverage of popular multiple regression and multi-factor ANOVA approaches, and power for multiple comparisons and simple effects. Practical issues such as how to increase power without increasing sample size, how to report findings, how to derive effect size expectations, and how to support null hypotheses, are also addressed. Unlike other texts, this book focuses on the statistical and methodological aspects of the analyses. Performing analyses using software applications rather than via complex hand calculations is demonstrated throughout. Ready-to-use IBM SPSS syntax for conducting analyses are included to perform calculations and power analyses at Detailed annotations for each syntax protocol review the minor modifications necessary for researchers to adapt the syntax to their own analyses. As such, the text reviews both power analysis techniques and provides tools for conducting analyses. Numerous examples enhance accessibility by demonstrating specific issues that must be addressed at all stages of the power analysis and providing detailed interpretations of IBM SPSS output. Several examples address techniques for estimation of power and hand calculations as well. Chapter summaries and key statistics sections also aid in understanding the material. Chapter 1 reviews significance testing and introduces power. Chapters 2 through 9 cover power analysis strategies for a variety of common designs. Precision analysis for confidence intervals around mean difference, correlations, and effect sizes is the focus of chapter 10. The book concludes with a review of how to report power analyses, a review of freeware and commercial software for power analyses, and how to increase power without increasing sample size. Chapters focusing on simpler analyses such as t-tests present detailed formulae and calculation examples. Chapters focusing on more complex topics such as mixed model ANOVA/MANOVA present primarily computer-based analyses. Intended as a supplementary text for graduate-level research methods, experimental design, quasi-experimental methods, psychometrics, statistics, and/or advanced/multivariate statistics taught in the behavioral, social, biological, and medical sciences, researchers in these fields also appreciate this book's practical emphasis. A prerequisite of introductory statistics is recommended.
A signature strength of science is that the evidence is reproducible. However, direct replications rarely appear in psychology journals because standard incentives emphasize novelty over verification (for background information, see Nosek, Spies, & Motyl, 2012). This special issue on “Replications of Important Results in Social Psychology” alters those incentives. We invite proposals for high-powered, direct replications of important results in social psychology. The review process will focus on the soundness of the design and the analysis, not on whether the outcome is positive or negative. (PsycINFO Database Record (c) 2013 APA, all rights reserved)
G*Power (Erdfelder, Faul, & Buchner, 1996) was designed as a general stand-alone power analysis program for statistical tests commonly used in social and behavioral research. G*Power 3 is a major extension of, and improvement over, the previous versions. It runs on widely used computer platforms (i.e., Windows XP, Windows Vista, and Mac OS X 10.4) and covers many different statistical tests of the t, F, and chi2 test families. In addition, it includes power analyses for z tests and some exact tests. G*Power 3 provides improved effect size calculators and graphic options, supports both distribution-based and design-based input modes, and offers all types of power analyses in which users might be interested. Like its predecessors, G*Power 3 is free.
In this article, the Society for Personality and Social Psychology (SPSP) Task Force on Publication and Research Practices offers a brief statistical primer and recommendations for improving the dependability of research. Recommendations for research practice include (a) describing and addressing the choice of N (sample size) and consequent issues of statistical power, (b) reporting effect sizes and 95% confidence intervals (CIs), (c) avoiding "questionable research practices" that can inflate the probability of Type I error, (d) making available research materials necessary to replicate reported results, (e) adhering to SPSP's data sharing policy, (f) encouraging publication of high-quality replication studies, and (g) maintaining flexibility and openness to alternative standards and methods. Recommendations for educational practice include (a) encouraging a culture of "getting it right," (b) teaching and encouraging transparency of data reporting, (c) improving methodological instruction, and (d) modeling sound science and supporting junior researchers who seek to "get it right."