ArticlePDF Available

The Replication Recipe: What Makes for a Convincing Replication?

Authors:

Abstract and Figures

Psychological scientists have recently started to reconsider the importance of close replications in building a cumulative knowledge base; however, there is no consensus about what constitutes a convincing close replication study. To facilitate convincing close replication attempts we have developed a Replication Recipe, outlining standard criteria for a convincing close replication. Our Replication Recipe can be used by researchers, teachers, and students to conduct meaningful replication studies and integrate replications into their scholarly habits.
Content may be subject to copyright.
The Replication Recipe: What makes for a convincing replication?
Mark J. Brandt
a,
,1
, Hans IJzerman
a,1
, Ap Dijksterhuis
b,2
,FrankJ.Farach
c,2
, Jason Geller
d,2
,
Roger Giner-Sorolla
e,2
, James A. Grange
f,2
, Marco Perugini
g,2
, Jeffrey R. Spies
h,2
, Anna van 't Veer
a,i,2
a
Tilburg University, Netherlands
b
Radboud University Nijmegen, Netherlands
c
University of Washington, USA
d
Iowa State University, USA
e
University of Kent, UK
f
Keele University, UK
g
University of Milano-Bicocca, Italy
h
Center for Open Science, USA
i
TIBER (Tilburg Institute of Behavioral Economics), Netherlands
HIGHLIGHTS
Close replications are an important part of cumulative science.
Yet, little agreement exists about what makes a replication convincing.
We develop a Replication Recipe to facilitate close replication attempts.
This includes the faithful recreation of a study with high statistical power.
We discuss evaluating replication results and limitations of replications.
abstractarticle info
Article history:
Received 10 July 2013
Revised 12 October 2013
Available online 23 October 2013
Keywords:
Replication
Statistical power
Research method
Pre-registration
Solid Science
Psychological scientists have recently started to reconsiderthe importance of close replications in building a cu-
mulative knowledge base; however, there is no consensus about what constitutes a convincing close replication
study. To facilitate convincing close replication attempts we have developed a Replication Recipe, outlining stan-
dard criteria for a convincing close replication. Our Replication Recipe can be used by researchers, teachers, and
students to conduct meaningful replication studies and integrate replications into their scholarly habits.
© 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Introduction
Replicability in research is an important component of cumulative
science (Asendorpf et al., 2013; Jasny, Chin, Chong, & Vignieri, 2011;
Nosek, Spies, & Motyl, 2012; Rosenthal, 1990; Schmidt, 2009), yet
relatively few close replication attempts are reported in psychology
(Makel, Plucker, & Hegarty, 2012). Only recently have researchers
systematically reported replications online (e.g., psychledrawer.org,
openscienceframework.org) and experimented with special issues to
incorporate replications into academic publications (e.g., Nosek &
Lakens, 2013; Zwaan & Zeelenberg, 2013). Moreover, some prestigious
psychology journals (e.g., Journal of Experimental Social Psychology,
Journal of Personality and Social Psychology,Psychological Science) are re-
cently willing to publish both failed and successful replication attempts
(e.g., Brandt, 2013; Chabris et al., 2012; LeBel & Campbell, in press;
Matthews, 2012; Pashler, Rohrer, & Harris, in press)andeven
devote ongoing sections to replications (see the new section in Perspec-
tives on Psychological Science,Registered replication reports, 2013).
From initial conclusions drawn from replication attempts of
important ndings in the empirical literature, it is clear that replica-
tion studies can be quite controversial. For example, the failure of re-
cent attempts to replicate social primingeffects (e.g., Doyen, Klein,
Pichon, & Cleeremans, 2012; Pashler et al., in press) has prompted
Journal of Experimental Social Psychology 50 (2014) 217224
This is an open-access article distributed under the terms of the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in
any medium, provided the original author and source are credited.
Corresponding author.
E-mail address: m.j.brandt@tilburguniversity.edu (M.J. Brandt).
1
First two authors share rst authorship.
2
All other authors share second authorship.
0022-1031/$ see front matter © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jesp.2013.10.005
Contents lists available at ScienceDirect
Journal of Experimental Social Psychology
journal homepage: www.elsevier.com/locate/jesp
psychologists and science journalists to raise questions about the en-
tire phenomenon (e.g., Bartlett, 2013). Failed replications have
sometimes been interpreted as 1) casting doubt on the veracity of
an entire subeld (e.g., candidate gene studies for general intelli-
gence, Chabris et al., 2012); 2) suggesting that an important compo-
nent of a popular theory is potentially incorrect (e.g., the status-
legitimacy hypothesis of System Justication Theory, Brandt, 2013); or
3) suggesting that a new nding is less robust than when rst intro-
duced (e.g., incidental values affecting judgments of time; Matthews,
2012). Of course, there are other valid reasons for replication failures:
Chance, misinterpretation of methods, and so forth.
Nevertheless, not all replication attempts reported so far have been
unsuccessful. Burger (2009) successfully replicated Milgram's famous
obedience experiments (e.g., Milgram, 1963), suggesting that when
well-conducted replications are successful they can provide us with
greater condence about the veracity of the predicted effect. Moreover,
replication attempts help estimate the effect size of a particular effect
and can serve as a starting point for replicationextension studies that
further illuminate the psychological processes that underlie an effect
and that can help to identify its boundary conditions (e.g., Lakens,
2012; Proctor & Chen, 2012). Replications are therefore essential for
theoretical development through conrmation and disconrmation of
results. Yet there seems to be little agreement as to what constitutes an
appropriate or convincing replication, what we should infer from replica-
tion failuresor successes,and what close replications mean for psy-
chological theories (see e.g., the commentary by Dijksterhuis, 2013 and
the reply by Shanks & Newell, 2013). In this paper, we provide our Rep-
lication Recipe for conducting and evaluating close replication attempts.
Close replication attempts
In general, how can one dene close replication attempts? The most
concrete goals are to test the assumed underlying theoretical process,
assess the average effect size of an effect, and test the robustness of an
effect outside of the lab of the original researchers by recreating the
methods of a study as faithfully as possible. This information helps
psychology build a cumulative knowledge base. This not only aids the
construction of new, but also the renement of old, psychological theo-
ries. In the denition of our ReplicationRecipe, close replications refer to
those replications that are based onmethods and procedures as close as
possible to the original study. We use the term close replications
because it highlights that no replications in psychology can be absolute-
ly director exactrecreations of the original study (for the basis of
this claim see Rosenthal, 1991; Tsang & Kwan, 1999). By denition
then, close replication studies aim to recreate a study as closely as
possible, so that ideally the only differences between the two are the
inevitable ones (e.g., different participants; for more on the benets of
close replications see e.g., Schmidt, 2009; Tsang & Kwan, 1999).
The Replication Recipe
What constitutes a convincing close replication attempt, and how
does one evaluate such anattempt? This is what the Replication Recipe
seeks to address. The Replication Recipe is informed by the goals of a
close replication attempt: Accurately replicating methods and estimat-
ing effect sizes and evaluating the robustness of the effect outside the
lab of origin. Our discussion is based on a synthesis of our own trials
and errors in conducting replications and guidelines recently developed
for special issues and sections of psychology journals (Nosek & Lakens,
2013; Open Science Collaboration, 2012; Registered replication
reports, 2013; Zwaan & Zeelenberg, 2013). In this synthesis, we make
explicit theexpectations and necessary qualities of a convincingreplica-
tion that can be used by researchers, teachers, and students when de-
signing and carrying out replication studies.
A convincing close replication par excellence is executed rigorously
by independent researchers or labs and includes the following ve addi-
tional ingredients:
1. Carefully dening the effects and methods that the researcher in-
tends to replicate;
2. Following as exactly as possible the methods of the original study
(including participant recruitment, instructions, stimuli, measures,
procedures, and analyses);
3. Having high statistical power;
4. Making complete details about the replication available, so that
interested experts can fully evaluate the replication attempt (or
attempt another replication themselves);
5. Evaluating replication results, and comparing them critically to the
results of the original study.
Each of these criteria is described and justied below. We present
and explain 36 questions that need to be addressed in a solid replication
(see Table 1
3
). This list of questions can be used as a checklist to guide
the planning and communication of a study and will help readers and
reviewers to evaluate the replication, by understanding the decisions
that a replicator has made when designing, conducting, and reporting
their replication. These questions are intended to help replicators follow
the Replication Recipe and determine when and why they have deviat-
ed from the ve Replication Recipe ingredients.
Ingredient #1: Carefully dening the effects and methods that the re-
searcher intends to replicate
Prior to conducting a replication study, researchers need to carefully
consider the precise effect they intend to replicate (Questions 19),
including the size of the original effect (Question 3), the effect size's
condence intervals (Question 4) and the methods used to uncover it
(Questions 59). Although this can be a straightforward task, in many
studies the effect of interest may be a specic aspect of a more compli-
cated set of results. For example, in a 2 × 2 design where the original
effect was a complete cross-over interaction, such that an effect was
positive in one condition and negative in the other, the effect of interest
may be the interaction, the positive and negative simple effects, or per-
haps just one of the simple effects. On other occasions, the information
about the methods used to obtain the effect will be unclear (e.g., the
original country the study was completed in, Question 7); in these
cases, it may be necessary to ask the original authors to provide the
missing information or to make an informed guess. It is important to
know the precise effect of interest from the beginning of the design-
phase of the replication because it determines nearly all of the decisions
that follow. A related consideration, especially when resources are
limited, is the importance and necessity of replicating a particular effect
(Question 2). Such decisions to replicate or not should be based on
either the effect's theoretical importance to a particulareld or its direct
or indirectvalue to society. Another consideration is existing condence
in the reliability of the effect; an effect with a number of existing close
replications in the literature may be less urgent to replicate than one
without any such support (see discussion of the Replication value
project, 20122013). In other words, not every study is worth replicat-
ing. By considering the theoretical and practical importance of a nding
the best allocation of resources can be made.
Ingredient #2: Following exactly the methods of the original study
Once a study has been chosen for replication, and the precise effect
of interest has been identied, the design of the replication study can
commence. In an ideal world, the methods of the original study
3
Also available as a pre-registration form on openscience.org
218 M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
(including participant recruitment, instructions, stimuli, measures, pro-
cedures, and analyses) will be followed exactly; however, our prefer-
ence for the term close replicationreects the fact that this
ingredient is impossible to achieve perfectly, given the inevitable tem-
poral and geographical differences in the participants available to an in-
dependent lab (for a similar point see Rosenthal, 1991; Tsang & Kwan,
1999).
4
Nonetheless, the ideal of an exactreplication should be the
starting point of all close replication attempts and deviations from an
exact replication of the original study should be minimized (Questions
1014), documented, and justied (Questions 1725). Below we make
recommendations for how to best achieve this goal and what can be
done when roadblocks emerge.
To facilitate Ingredient #2 of the replication, researchers should start
with contacting the original authors of the study to try and obtain the
original materials (Question 10). If the original authors are not
cooperative or if they are unavailable (e.g., have left academia and can-
not be contacted, or if they have passed away), the necessary methods
should be recreated to the best of the replicator researchers' ability,
based on the methods section of the original article and under the as-
sumption that the original authors conducted a highly rigorous study.
For example, if replication authors are unable to obtain the reaction
time windows or stimuli used in a lexical decision task, they should fol-
low the methods of the original article as closely as possible and to ll in
the gaps by adopting best practices from research on lexical decision
tasks. In these cases, the replication researchers should then also seek
the opinionof expert colleagues in therelevant area to providefeedback
as to whether the replication study accurately recreates the original
article's study as described.
In other cases, the original materials may not be relevant for the rep-
lication study. For example, studies about Occupy Wall Street protests,
the World Series in baseball, or other historically- and culturally-
bound events are not easily closely replicated in different times and
places. In these cases the original materials should be modied to try
and capture the same psychological situation as the original experiment
(e.g., replicate the 2012 elections with the 2016 elections, or present
4
Except, perhaps, whenthe data of a single experiment are randomly divided into two
equal parts
Table 1
A 36-question guide to the Replication Recipe.
The Nature of the Effect
1. Verbal description of the effect I am trying to replicate:
2. It is important to replicate this effect because:
3. The effect size of the effect I am trying to replicate is:
4. The condence interval of the original effect is:
5. The sample size of the original effect is:
6. Where was the original study conducted? (e.g., lab, in the eld, online)
7. What country/region was the original study conducted in?
8. What kind of sample did the original study use?(e.g., student, Mturk, representative)
9. Was the original study conducted with paper-and-pencil surveys, on a computer, or something else?
Designing the Replication Study
10. Are the original materials for the study available from the author?
a. If not, are the original materials for the study available elsewhere (e.g., previously published scales)?
b. If the original materials are not available from the author or elsewhere, how were the materials created
for the replication attempt?
11. I know that assumptions (e.g., about the meaning of the stimuli)in the original study
will also hold in my replication because:
12. Location of the experimenter during data collection:
13. Experimenter knowledge of participant experimental condition:
14. Experimenter knowledge of overall hypotheses:
15. My target sample size is:
16. The rationale for my sample size is:
Documenting Differences between the Original and Replication Study
For each part of the study indicate whether the replication study is Exact, Close, or Conceptually Different
compared to the original study. Then, justify the rating.
17. The similarities/differences in the instructions are: [Exact | Close | Different]
18. The similarities/differences in the measures are: [Exact | Close | Different]
19. The similarities/differences in the stimuli are: [Exact | Close | Different]
20. The similarities/differences in the procedure are: [Exact | Close | Different]
21. The similarities/differences in the location (e.g., lab vs. online; alone vs. in groups) are: [Exact | Close | Different]
22. The similarities/differences in remuneration are: [Exact | Close | Different]
23. The similarities/differences between participant populations are: [Exact | Close | Different]
24. What differences between the original study and your study might be expected to inuence the size and/or
direction of the effect?:
25. I have taken the following steps to test whether the differences listed in #24 will inuence the outcome of my
replication attempt:
Analysis and Replication Evaluation
26. My exclusion criteria are (e.g., handling outliers, removing participants from analysis):
27. My analysis plan is (justify differences from the original):
28. A successful replication is dened as:
Registering the Replication Attempt
29. The nalized materials, procedures, analysis plan etc of the replication are registered here:
Reporting the Replication
30. The effect size of the replication is:
31. The condence interval of the replication effect size is:
32. The replication effect size [is/is not] (circle one) signicantly different from the original effect size?
33. I judge the replication to be a(n) [success/informative failure to replicate/practical failure to
replicate/inconclusive] (circle one) because:
34. Interested experts can obtain my data and syntax here:
35. All of the analyses were reported in the report or are available here:
36. The limitations of my replication study are:
219M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
British participants with a cricket rather than baseball championship).
In such cases, the most valid replication attempt may actually entail
changing the stimulus materials to ensure that they are functionally
equivalent.
5
To ensure that the modied materials effectively capture
the same constructs as the original study they can (when possible) be
developed in collaboration with the original authors and the research
community can be polled for their input (via e.g., professional discus-
sion forums and e-mail lists). In some cases, depending on the severity
of the change, it will be necessary to conduct a pilot study, testing the
equivalence of manipulations and measures to constructs tested in the
original research prior to the actual replication attempt. The justica-
tions or steps taken to ensure that the assumptions about the meaning
of the stimuli hold in thereplication attempt should be clearly specied
(Question 11).
Although there is no single conclusive replication (or original study
for that matter), andnosuch burden shouldbe put on an individual rep-
lication study, the replication researcher should do his or her best to
minimize the differences between the replication and the original
study and identify what these differences are. Questions 1723 ask
replicators to categorize which parts of the study are exactly the same
as, close to, or conceptually different from the original study and to
then justify the differences. All of these are imperfect categories that
exist along a continuum, but this categorization task yields at least
three benets. First, reviewers, readers, and editors can judge for them-
selves whether or not they think that the deviation from the original
study was justied. In some cases, a deviation will be clearly justied
(e.g., using a different, but demographically similar, sample of partici-
pants), whereas in other cases it may be less clear-cut (e.g., replicating
a non-internet computer-based lab study done in cubicles on the inter-
net). Second, by identifying differences between replication andoriginal
studies (sample, culture, lab context, etc.) researchers and readers can
identify where the replication is on the continuum from closeto
conceptual.Third, after multiple replication attempts have been re-
corded, these deviations can be used to determine relevant boundary
conditions on a particular effect (for more elaboration on this point
see Greenwald, Pratkanis, Leippe, & Baumgardner, 1986; IJzerman,
Brandt, & van Wolferen, 2013).
In the process of identifying and justifying deviations from the orig-
inal study, replicators shouldanticipate differences between the original
and replication study that may inuence the size and direction of the ef-
fect and testthese possibilities (Question 24). For example, studies have
revealed that people of varying social classes have different psycholog-
ical processes related to the perception of threat, self-control, and per-
spective taking (among other things; e.g., Henry, 2009; Johnson,
Richeson, & Finkel, 2011; Kraus, Piff, Mendoza-Denton, Rheinschmidt,
& Keltner, 2012). Similarly, people process a variety of information dif-
ferently when they are in a positive or negative mood (for reviews
Forgas, 1995; Rusting, 1998). Conducting a replication at a university
(or online) drawing students from different socioeconomic strata
(SES) than the original population or in circumstances where partici-
pants tendto bein adifferent mood than the participants in the original
study (e.g., immediately prior to mid-term exams compared to the
week after exams) may affect the outcome of the replication. In this
case, an individual difference measure of SES or mood could be included
at the end of the replication study so as to not interfere with the close
replication of the original study. Then, a statistical moderator test within
the replication study's sample could help understand the degree to which
differences in effects between samples can be explained by individual
differences in SES or mood. This way it is possible to test if the differences
identied in Question 24 impact the replication result (Question 25).
Ingredient #3: Having high statistical power
It is crucial that a planned replication has sufcient statistical power,
allowing a strong chance to conrm as signicant the effect size from
the original publication (see Simonsohn, 2013).
6
Underpowered repli-
cation attempts may incorrectly suggest original effects are false posi-
tives, impeding genuine scientic progress. Some authors have
recommended that a sufcient amount of statistical power is at least
.80 (Cohen, 1992) up to .95 (Open Science Collaboration, 2012). Because
effect sizes in the published literature are likely to be overestimates of
the true effect size (Greenwald, 1975), researchers should err conserva-
tively, toward higher levels of power.
7
Power calculations are one potential rationale for determining sam-
ple size in the replication attempt (Questions 10 & 11).
8
Calculating the
power for a close replication studycan be very straightforward for some
study designs (e.g., a t-test). For other study designs, power analyses can
be more complicated, and we encourage researchers to consult the ap-
propriate literature on statistical power and sample size planning when
designing replication attempts (see, e.g., Aberson, 2010; Cohen, 1992;
Faul, Erdfelder, Lang, & Buchner, 2007; Maxwell, Kelley, & Rausch,
2008; Scherbaum & Ferreter, 2009; Shieh, 2009; Zhang & Wang, 2009
for useful information on power analysis). It has also been suggested
that an alternative for determining sample sizes is to take 2.5 times
the original sample size (Simonsohn, 2013).
Ingredient #4: Making complete details about the replication available
Close replication attempts may be seen as a thorny issue; openness in
the replication process can help ameliorate this issue. As a rule, in order to
evaluate close replication attempts as well as possible, complete details
about the methods, analyses, and outcomes of a replication should be
available to reviewers, editors, and ultimately to the readers of the
resulting article. One way to achieve this is to pre-register replication
attempts (Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit,
2012; for a pre-registration example see LeBel & Campbell, in press),
including the methods of the replication study (Questions 1016, 25),
differences between the original and replication study (Questions 17
24), and the planned analysis and evaluation of the replication attempt
(Questions 2628). Following the completion of the replication attempt,
the data, analysis syntax, and all analyses should be made available so
that the replication attempt can be fully evaluated and alternative expla-
nations for any effects can be explored (Questions 34 & 35).
9
Designing
and conducting replications with as much openness as ethically possible
inoculates against post hoc adjustment of replication success criteria, pro-
vides more transparency when readers evaluate the replication, gives
people less reason to suspect ulterior motives of the replicator, and
makes it more difcult to exercise liberty in choosing an analytic method
to exploit the chances of declaring the ndings in favor of (or against) the
hypothesis (Simmons, Nelson, & Simonsohn, 2011; Wagenmakers et al.,
2012). The information we recommend sharing, including the replication
pre-registration and data, can be accomplished with the Open Science
Framework (openscienceframework.org).
5
To be sure, replicationsin this type of situationare less close than whatis often meant
by close replications and some people will consider these replications conceptual
replications.
6
When attempting to replicate a study that has already been the subjectof several rep-
licationattempts it is desirable to base the replicationpower calculationsand sample sizes
on the average meta-analytic effect size.
7
The high power necessary for a convincing close replication can provide a challenge
for researchers that do not have access to very large samples. One option, though it does
not appear to be used often, is to combine resources with other labs to collect the neces-
sary number of participants (similar to initiatives developed by Perspectives on Psycholog-
ical Science,Registered replication reports, 2013).
8
Although,there are other defensible sample size justications (seee.g., Maxwell et al.,
2008)
9
Exceptions can be made on data protection grounds (e.g., when data are difcult to
anonymize, or when unable to share privileged information from companies).
220 M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
Ingredient #5: Evaluating replication results and comparing them critically
to the results of the original study
Replication studies are not studies in isolation and so the statistical
results need to be critically compared to the results of the original
study. The meaning of this comparison needsto be carefully considered
in the discussion section of a replication article. It is not enough to deliv-
er a judgment of successful/failed replicationdepending solely on
whether or not the replication study yields a signicant result. Replica-
tion effect size estimates (Question 30) and condence intervals (when
possible, Question 31) need to be calculated and the effect size estimate
should be statisticallycompared to the original effect size (Question 32).
Evaluating the replication should involve reporting two tests: 1) the
size, direction and condence interval of the effect, which tell us wheth-
er the replication effect is signicantly different from the null; 2) an
additional test of whether it is signicantly different from the original
effect. This helps determine whether the replication was a success
(different from the null, and similar to or larger than the original and
in the same direction), an informative failure to replicate (either not
different from null, or in the opposite direction from the original, and
signicantly different from original), a practical failure to replicate
(both signicantly different from the null and from the original), or
inconclusive (neither signicantly different from null nor the original)
(Question 33; for the criteria for these decisions see Simonsohn,
2013; for additional discussion about evaluating replication results
see Asendorpf et al., 2013; Valentine et al., 2011). It may also be gen-
erally informative for any replication report to produce a meta-
analytic aggregation of the replication study's effect with the original
and with any other close replications existing in the literature.
10
It is
important that a discussion of replication results and their conclu-
sions take into account the limitations of the replication attempt
and the original study and possibilities of Type I and Type II errors
and random variation in the true size of the effect from study to
study (e.g., Borenstein, Hedges, & Rothstein, 2007; Question 36). In
evaluating the replication results, one should carefully consider the
total weight of the evidence bearing on an effect.
One testable consideration for explaining differences in the results of
a replication study and an original study are the many features of the
study context that could inuence the outcomes of a replication
attempt. Some of these contextual variations are due to specic theoret-
ical considerations. These may be as obvious as SES or religiosity in a
sample, but may also be as basic and nonobvious as variations in room
temperature (cf. IJzerman & Semin, 2009). In other cases, there may
be methodological considerations, which may mean the manipulation
or the measurement of the dependent variable is less accurate, such as
when changing the type of computer monitor (e.g., CRT vs. LCD; Plant
& Turner, 2009) or input device used (e.g., keyboard vs. response button
box; Li, Liang, Kleiner, & Lu, 2010). For example, it is quite possible that
the same stimulus presentation times using computer monitors of
different brands or even the same brand but with different settings
will be subliminal in one case, butsupraliminal in another. Therefore, di-
rectly adopting the programming code used in the original study will
not necessarily be enough to replicate the experience of the stimuli by
the participants in the original study.
11
To be clear, these possible vari-
ations should not be used defensively as untested post-hoc justications
for why an effect failed to replicate. Rather, our suggestion is that
researchers should carefully consider and test whether a speciccon-
textual feature actually does systematically and reliably affect some spe-
cic results and whether this feature was the critical feature affecting
the discrepancy in results beween the original and the replication study.
By conducting several replications of the same phenomenon in mul-
tiple labs it may be possible to identify the differences between studies
that affect the effect size, and design follow-up studies to conrm their
inuence. Multiple replication attempts have the added bonus of more
accurately estimating the effect size. The accumulation of studies helps
rmly establish an effect, accurately estimate its size, and acquire
knowledge about the factors that inuence its presence and strength.
This accumulation might take the form of multiple demonstrations of
the effect inthe original empirical paper, as well as in subsequent repli-
cation studies.
Implementation
The Replication Recipe can be implemented formally by completing
the 36 Questions in Table 1 and using this information when pre-
registering and reporting the replication attempt. To facilitate the
formal use of the Replication Recipe we have integrated Table 1 into
the Open Science Framework as a replication template (see Fig. 1).
Researchers can choose the Replication Recipe Pre-Registration tem-
plate and then complete the questions in Table 1 that should be com-
pleted when pre-registering the study. This information is then saved
with the read-only time-stamped pre-registration le on the Open
Science Framework and a link to this pre-registration can be included
in a submitted paper. When researchers have completed the replication
attempt, they can choose a Replication Recipe Post-Completionregistra-
tion and then complete the remaining questions in Table 1 (see Fig. 2).
Again, researchers can include a link to this information in their submit-
ted paper. This will help standardize the registration and reporting of
replication attempts across lab groups and help consolidate the infor-
mation available about a replication study.
Limitations of the Replication Recipe
There are several limitations to the Replication Recipe. First, it is not
always feasible to collaborate with the original author on a replication
study. Much of the Recipe is easier to accomplish with the help of a
cooperative original author, and we encourage these types ofcollabora-
tions. However, we are aware that there are times when the replicator
and the original author may have principled disagreements or it is not
possible to work with the original author. When collaboration with
the original author is not feasible, the replicator should design and
conduct the study under the assumption that the original study was
conducted in the best way possible. Therefore, while we encourage
both replicators and original authors to seek a cooperative and even
collaborative relationship, when this does not occur replication studies
should still move forward.
Second, some readers will note that the Replication Recipe has more
stringent criteria than original studies and may object that if it was
good enough for the original, it is good enough for the replication.
We believe that this reasoning highlights some of the broader method-
ological problems in science and is not a limitation of the Replication
Recipe, but rather of the modal research practices in some areas of
research (LeBel & Peters, 2011; Murayama, Pekrun, & Fiedler, in press;
Simmons et al., 2011; SPSP Task Force on Research & Publication Prac-
tices, in press). Original studies would also benet from following
many of the ingredients of the Replication Recipe. For example, just as
replication studies may be affected by highly speciccontexts,original
results may also simply be due to the specic contexts in which they
were originally tested. Consequently, keeping track of our precise
methods will help researchers more efciently identify the speciccon-
ditions necessary (or not) for effects to occur. A simple implication is
10
Note thatin a meta-analytic approach the overall effectsize would almost certainly be
affectedmore by a high-powered replication than theoriginal study (assuming it had less
statistical power).Under these conditions, the somewhatsurprising conclusion is that one
should trust the results of the higher-powered replication more than a lower-powered
original study, assuming the replication is of high quality and there are no meaningful
moderators of the differences between the original and replication study. A status quo in
which most original studies reach equally high power levels would eliminate this
imbalance.
11
This example was adapted from a talk by Dominique Muller given at the 2013
European Social Cognition Network meeting.
221M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
that, both for replication and original studies, (more) modesty is called
for in drawing conclusions from results.
Third, the very notion of single replication attemptsmay unin-
tentionally prime people with a competitive, score-keeping men-
tality (e.g., 2 failures vs. 1 success) rather than taking a broader
meta-analytic point-of-view on any given phenomenon. The Repli-
cation Recipe is not intended to aid score keeping in the game of
science, but rather to enable replications that serve as building
blocks of a cumulative science. Our intention is that the Replication
Recipe helps the abstract scientic goal of getting it right(cf. Nosek
et al., 2012) and is why we advocate conducting multiple close rep-
lications of important ndings rather than relyingon a single original
demonstration.
Fourth, successful close replications may aid in solidifying a particu-
lar nding in the literature; however, a close replication study does not
address any potential theoretical limitations or confounds in the origi-
nal study design that cloud the inferences that may be drawn from it.
If the original study was plagued by confounds or bad methods, then
the replication study will similarly be plagued by the same limitations
(Tsang & Kwan, 1999).
12
Beyond close replications, conceptual replica-
tions, or close replication and extension designs, can be used to remove
confounds and extend the generalizability of a proposed psychological
process (Bonett, 2012; Schmidt, 2009). When focusing on a theoretical
prediction rather than effects within a given paradigm, a combination
of close and conceptual replications is the best way to build condence
in a result.
Fifth, a replication failure does not necessarily mean that the original
nding is incorrect or fraudulent. Science is complex, and we are work-
ing in the arena of probabilities meaning that some unsuccessful replica-
tions are expected. It is this very complexity that leads us to suggest that
researchers keep careful track of the differences between original and
replication studies, so as to identify and rigorously test factors that
drive a particular effect. Indeed, just as moderators that turn onor
turn offan effect are invaluable for understanding the underlying psy-
chological processes, unsuccessful replications can also be keys to
unlocking the underlying psychological processes of an effect.
Conclusion
It is clear that replications are a crucial component of cumulative
science because they help establish the veracity of an effect and aid
in precisely estimating its effect size. Simply stated, well-constructed
replications rene our conceptions of human behavior and thought.
Our Replication Recipe serves to guide researchers who are planning
and conducting convincing close replications, with the answers to our
36 questions serving as a basis for the replication study. We have rec-
ommended that researchers faithfully recreate the original study;
keep track of differences between the replication and original study;
check the study's assumptions in new contexts; adopt high powered
replication studies; pre-register replication materials and methods;
and evaluate and report the results as openly as ethically possible and
in accordance with the ethical guidelines of the eld. We have
12
There is somequestion as to whether it is appropriate to make obvious improvements
to the original study, such as using a new and improved version of a scale, when
conducting a close replication. We suspectthat it would be better if thereplication, in con-
sultation with the original authors, used improved methodsand outlined thereasoning for
doing so. Running at least two replications will provide the most information: one that
uses the original methodology (e.g., the old measure) and one that uses the improved
methodology (e.g., the new measure). A second option is to include the change in the
study as a randomized experimental factor so that participants are randomly assigned to
completethe study with the original or the improved methodology. Thesesolutions would
help clarify whether the original material had caused the effect (or its absence).
Fig. 1. Choosing the Replication Recipe as a replication template on the Open Science Framework.
222 M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
suggested that researchers measure potential moderators in a way that
does not interfere with the original study, to help determine the
reason for potential differences between the original and replication
study, which in turn helps build theory beyond merereplication.
By conducting high-powered replication studies of important ndings
we can build a cumulative science. With our Replication Recipe,
we hope to encourage more researchers to conduct convincing
replications that contribute to theoretical development, conrmation,
and disconrmation.
References
Aberson, C. L. (2010). Applied power analysis for the behavioral sciences. New York:
Routledge.
Asendorpt, J. B., Conner, M., De Fruyt, F., De Houwer, J., Denissen, J. J. A., Fiedler, K., et al.
(2013). Recommendations for increasing replicability in psychology. European Jour-
nal of Personality,27,108119.
Bartlett, T. (2013). Power of suggestion: The amazing inuence of unconscious cues is
among the most fascinating discoveries of our timethat is, if it's true. The
Chronicle of Higher Education (Retrieved from http://chronicle.com/article/Power-of-
Suggestion/136907/)
Bonett, D.G. (2012). Replication-extension studies. Current Directions in Psychological
Science,21,409412.
Borenstein, M., Hedges, L., & Rothstein, H. (2007). Meta analysis: Fixed effect versus ran-
dom effects. Retrieved from. http://www.Meta-Analysis.com
Brandt, M. J. (2013). Do the disadvantaged legitimize the socialsystem? A large-scale test
of the statuslegitimacy hypothesis. Journal of Personality and Social Psychology,104,
765785.
Burger, J. M. (2009). Replicating Milgram: Would people still obey today? American Psy-
chologist,64,111.
Chabris, C. F., Hebert, B.M., Benjamin, D. J., Beauchamp, J., Cesarini, D., van der Loos, M.,
et al. (2012). Most reported genetic associations with general intelligence are proba-
bly false positives. Psychological Science,23,13141323.
Cohen, J. (1992). A power primer. Psychological Bulletin,112,155159.
Dijksterhuis, A. (2013). Replication crisis or crisis in replication? A reinterpretation of
Shanks et al. [Comment on Empirical Article e56515]. Retrieved from. http://www.
plosone.org/annotation/listThread.action?root=64751
Doyen, S., Klein, O., Pichon, C. L., & Cleeremans, A. (2012). Behavioral priming: It's all in
the mind, but whose mind? PLoS ONE,7, e29081.
Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G* Power 3: A exible statistical
power analysis program for the social, behavioral, and biomedical sciences.
Behavior Research Methods,39,175191.
Forgas, J. P. (1995). Mood and judgment: The affect infusion model (AIM). Psychological
Bulletin,117,3966.
Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis.
Psychological Bulletin,82,120.
Greenwald,A. G., Pratkanis, A.R., Leippe, M. R., & Baumgardner, M. H. (1986).Under what
conditions does theory obstruct research progress? Psychological Review,93(2),
216229.
Henry, P. J. (2009). Low-status compensation: A theory for understanding the role of status
in cultures of honor. Journal of Personality and Social Psychology,97, 451466.
IJzerman, H., Brandt, M. J., & van Wolferen, J. (2013). Rejoice! In replication. European
Journal of Personality,127,128129.
IJzerman, H., & Semin, G. R. (2009). The thermometer of social relations mapping social
proximity on temperature. Psychological Science,20,12141220.
Jasny, B. R., Chin, G., Chong, L., & Vignieri, S. (2011). Again, and again, and again. Science,
334,1225.
Johnson, S. E., Richeson, J. A., & Finkel, E. J. (2011). Middle class and marginal? Socioeco-
nomic status, stigma, and self-regulation at an elite university. Journal of Personality
and Social Psychology,100,838852.
Kraus, M. W., Piff, P. K., Mendoza-Denton, R., Rheinschmidt, M. L., & Keltner, D. (2012).
Social class, solipsism, and contextualism: How the rich are different from the poor.
Psychological Review,119,546572.
Lakens, D. (2012). Polarity correspondence in metaphor congruency effects: Structural
overlap predicts categorization times for bipolar concepts presented in vertical
space. Journal of Experimental Psychology: Learning, Memory, and Cognition,38,
726736.
LeBel, E. P., & Campbell, L. (2013). Heightened sensitivity to temperature cues in individ-
uals with high anxious attachment: Real or elusive phenomenon? Psychological
Science,24, 21282130.
LeBel, E. P., & Peters, K. R. (2011). Fearing the future of empirical psychology: Bem's
(2011) evidence of psi as a case study of deciencies in modal research practice.
Review of General Psychology,15,371379.
Fig. 2. Example of reporting a replication with the Replication Recipe on the OpenScience Framework.
223M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
Li, X., Liang, Z., Kleiner, M., & Lu, Z. (2010). RTbox: A device for highly accurate response
time measurements. Behavior Research Methods,42,212225.
Makel, M. C., Plucker,J. A., & Hegarty, B. (2012). Replications in psychology research: How
often do they really occur? Perspectives on Psychological Science,7,537542.
Matthews, W. J. (2012). How much do incidental values affect the judgment of time?
Psychological Science,23,14321434.
Maxwell, S. E., Kelley, K., & Rausch, J. R.(2008). Sample size planning forstatistical power
and accuracy in parameter estimation. Annual Review of Psychology,59,537563.
Milgram, S. (1963). Behavioral study of obedience. Journal of Abnormal and Social
Psychology,67,371378.
Murayama, K., Pekrun, R., & Fiedler, K. (in press). Research practices that can prevent an
ination of false-pos itive rates. Personality and Social Psychology Review (in press).
Nosek, B.A. , & Lakens, D. (2013). Call for proposals: Special issue of social psychology
on Replications of important results in social psychology.Social Psychology,44,
5960.
Nosek, B.A., Spies, J. R., & Motyl, M. (2012). Scientic utopia: II. Restructuring incentives
and practices to promote truth over publishability. Perspectives on Psychological
Science,7,615631.
Open Science Collaboration. (2012). An open, large-scale, collaborative effort to estimate
the reproducibility of psychological science. Perspectives on Psychological Science,7,
657660.
Pashler, H., Rohrer, D., & Harris, C. (2013). Can the goal of honesty be primed? Journal of
Experimental Social Psychology,49,959964.
Plant, R. R., & Turner, G. (2009). Millisecond precision psychological research in a world of
commodity computers: New hardware, new problems? Behavior Research Methods,
41,598614.
Proctor, R. W., & Chen, J. (2012). Dissociating inuences of key and hand separation on
the Stroop color -identication effect. Acta Psychologica,141,3947.
Registered replication reports (2013). Retrieved June 5, 2013, from. http://www.
psychologicalscience.org/index.php/replication
Replication value project [Electronic mailinglist discussion] (2012-2013). Retrieved from
Open Science Framework Google Group: https://groups.google.com/d/topic/
openscienceframework/Hnn33i2fSyU/discussion
Rosenthal, R. (1990). How are we doing in soft psychology? American Psychologist,45,
775777.
Rosenthal, R. (1991). Replication in behavioral research.In J. W. Neuliep (Ed.), Replication
research in the social sciences (pp. 130). Newbury Park, CA: Sage.
Rusting, C. L. (1998). Personality, mood, and cognitive processing of emotional informa-
tion: Three conceptual frameworks. Psychological Bulletin,124,165196.
Scherbaum, C. A., & Ferreter, J. M. (2009). Estimating statistical power and required sam-
ple sizes for organizational research using multilevel modeling. Organizational
Research Methods,12,347367.
Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is
neglected in the social sciences. Review of General Psychology,13,90100.
Shanks, D. R., & Newell, B. R. (2013). Response to Dijksterhuis. [Comment on Empirical
Article e56515]. Retrieved May 20, 2013, from. http://www.plosone.org/annotation/
listThread.action?root=64795
Shieh, G. (2009). Detecting interaction effects inmoderated multipleregression with con-
tinuous variables power and sample size considerations. Organizational Research
Methods,12,510528.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undis-
closed exibility in data collection and analysis allows presenting anything as signif-
icant. Psychological Science,22,13591366.
Simonsohn, U. (2013). Evaluating replication results. Available at SSRN: http://ssrn.com/
abstract=2259879
SPSP Task Force on Research and Publication Practices(in press). Improving the depend-
ability of research in personality and social psychology: Recommendations for re-
search and educational prac tice. Personality and Social Psychology Review (in press).
Tsang, E. W.,& Kwan, K. M. (1999). Replicationand theory development in organizational
science: A critical realist perspective. Academy of Management Review,24,759780.
Valentine, J. C., Biglan, A., Boruch, R. F., Castro, F. G., Collins, L. M., Flay, B. R., et al. (2011).
Replication in prevention science. Prevention Science,12,103117.
Wagenmakers, E. J., Wetzels, R., Borsboom, D., van der Maas, H. L., & Kievit, R. A. (2012).
An agenda for purely conrmatory research. Perspectives on Psychological Science,7,
632638.
Zhang, Z., & Wang, L. (2009). Statistical power analysis for growth curve models using
SAS. Behavior Research Methods,41,10831094.
Zwaan, R. A., & Zeelenberg, R. (2013). Replication attempts of important results in the
study of cognition. Frontiers in Cognition (Retrieved from http://www.frontiersin.
org/Cognition/researchtopics/Replication_Attempts_of_Import/1461)
224 M.J. Brandt et al. / Journal of Experimental Social Psychology 50 (2014) 217224
... A replication study should have enough power to reduce the risk of false negatives sufficiently. One suggestion is to increase the original study's sample size 2.5 times in the replication study, or scholars should do power calculations to estimate the required sample size (Brandt et al. 2014). The required sample size will increase if the constructive replication includes moderating variables. ...
... A successful replication that confirms the original study's results does not prove that the original results were true (Brandt et al. 2014). There is always the possibility that the original study's design was flawed and that the replication study copied those flaws (Crandall and Sherman 2016;Hudson 2021) or that the replication study produced a consistent result by chance. ...
... Note 1. Although scholars in many fields recognize that minor differences between the original and replication studies make entirely exact replications impossible, they are widely accepted as tests of whether the results in the original study are false positives (e.g., Brandt et al. 2014;Crandall and Sherman 2016; Nosek and Errington 2020). ...
Article
Recent criticism of advertising research has highlighted its lack of practical relevance and the absence of replication studies. Both are significant shortcomings for applied science intended to improve advertising practice. This paper proposes a methodological four-stage approach to the study of advertising execution to address these deficiencies. First, scholars should identify a relevant advertising phenomenon in the real world by studying advertising practice using methods such as interviewing experts, reviewing trade magazines, or attending practitioners' conferences. Second, scholars must demonstrate the effects of the phenomenon in a laboratory experiment using internally valid stimuli, realistic stimuli exposures, adequate samples, and statistics focusing on effect sizes rather than p-values. Third, further studies have to confirm and explicate the effects using mediating or moderating variables. Finally, the effects of the advertising phenomenon should be rep-licated in field research. Widespread adoption of the four-stage approach would ensure the practical relevance of advertising research, and increase its validity and reliability.
... In most metascience literature, replication is defined intuitively and in an imprecise manner to refer to redoing experiments by pursuing the same experimental procedure to observe whether new results match the previous ones (2)(3)(4). This definition typically emphasizes repeating research protocols and analytical methods, and an aim to reproduce the results. ...
... Where even exact replication experiments fail to provide clear empirical evidence in support of or against a scientific claim, what can be accomplished by non-exact replication experiments will prove impossible to pin down. A common argument goes that how close or similar the methods and procedures used in a replication experiment are to those used in the original experiment is representative of the quality of replication (2,4,5). Similarity or closeness in this regard, however, is difficult to define since it has to be with respect to a reference (3) and even more difficult to measure. ...
Article
Replication experiments purport to independently validate claims from previous research or provide some diagnostic evidence about their reliability. In practice, this value of replication experiments is often taken for granted. Our research shows that in replication experiments, practice often does not live up to theory. Most replication experiments in practice are confounded and their results multiply determined, hence uninterpretable. These results can be driven by the true data generating mechanism, issues present in the original experiment, discrepancies between the original and the replication experiment, new issues introduced in the replication experiment, or combinations of any of these factors. The answers we are looking for with regard to the true state of nature require a rigorous and meticulous investigative process of eliminating errors and singling out elementary or pure cases. In this paper, we introduce the idea of a minimum viable experiment that needs to be identified in practice for replication results to be clearly interpretable. Most experiments are not replication-ready and before striving to replicate a given result, we need theoretical precision or systematic exploration to discover empirical regularities.
... Many of the questions raised by the replication crisis are conceptual or philosophical in nature (Fidler & Wilcox, 2018;Romero, 2019). For example, what does it mean to "replicate" a finding, successfully, convincingly, conceptually, or otherwise (Brandt et al., 2014;Hüffmeier et al., 2016;Machery, 2020)? What role do values play in promoting reproducible research, and particularly, values such as openness, trust, civility, and shame in science (Fiske, 2016;Levin & Leonelli, 2017;Wilholt, 2012)? ...
Full-text available
Article
The replication crisis is perceived by many as one of the most significant threats to the reliability of research. Though reporting of the crisis has emphasized social science, all signs indicate that it extends to many other fields. This paper investigates the possibility that the crisis and related challenges to conducting research also extend to philosophy. According to one possibility, philosophy inherits a crisis similar to the one in science because philosophers rely on unreplicated or unreplicable findings from science when conducting philosophical research. According to another possibility, the crisis likely extends to philosophy because philosophers engage in similar research practices and face similar structural issues when conducting research that have been implicated by the crisis in science. Proposals for improving philosophical research are offered in light of these possibilities.
... For instance, recent large-scale collaborative projects have proposed to examine whether findings from MTurk studies conducted between 2015 and 2018 replicate on current MTurk samples (Mechanical Turk Replication Project, 2021). One criterion for conducting faithful replications is to consider and account for conceptual differences between the original research and the 130,203] replication attempt (Brandt et al., 2014;Ramscar, 2016;Schwarz & Strack, 2014). Even though a new project may exactly replicate the procedures of prior studies, the effect may not replicate when procedures are no longer sufficient for manipulating the same conceptual constructs as before (see Luttrell et al., 2017, for an example of when construct validation requires that new procedures are necessary to replicate old conditions). ...
Full-text available
Article
Maintaining data quality on Amazon Mechanical Turk (MTurk) has always been a concern for researchers. These concerns have grown recently due to the bot crisis of 2018 and observations that past safeguards of data quality (e.g., approval ratings of 95%) no longer work. To address data quality concerns, CloudResearch, a third-party website that interfaces with MTurk, has assessed ~165,000 MTurkers and categorized them into those that provide high- (~100,000, Approved) and low- (~65,000, Blocked) quality data. Here, we examined the predictive validity of CloudResearch’s vetting. In a pre-registered study, participants (N = 900) from the Approved and Blocked groups, along with a Standard MTurk sample (95% HIT acceptance ratio, 100+ completed HITs), completed an array of data-quality measures. Across several indices, Approved participants (i) identified the content of images more accurately, (ii) answered more reading comprehension questions correctly, (iii) responded to reversed coded items more consistently, (iv) passed a greater number of attention checks, (v) self-reported less cheating and actually left the survey window less often on easily Googleable questions, (vi) replicated classic psychology experimental effects more reliably, and (vii) answered AI-stumping questions more accurately than Blocked participants, who performed at chance on multiple outcomes. Data quality of the Standard sample was generally in between the Approved and Blocked groups. We discuss how MTurk’s Approval Rating system is no longer an effective data-quality control, and we discuss the advantages afforded by using the Approved group for scientific studies on MTurk.
Full-text available
Article
PURPOSE-The paper seeks to investigate the impacts of government's incentives and internal aspects (i.e. firms' ethics and firms' attitudes) on the implementation of sustainability-oriented technology (SOT) among small and medium-sized enterprises (SMEs) in Tonga. Those aspects are imperative to examine as numerous enterprises in developing nations possess insufficient assets that suspend applying innovations, specifically SOT incorporated with enterprise management. Thus, it is unavoidable for an intermediary to intervene in technology implementation, and developing the more effective implementation process is reckoned. Meanwhile, governments possess the assets and authority to motivate the SOT implementation extensively. Therefore, this paper assesses governmental factors as influencing drivers for realizing cost-effective and well-organized implementation. METHODOLOGY-The paper employs the partial least squares structural equation modeling (PLS-SEM) technique to assess the information collected from 266 Tongan SMEs. FINDINGS-The outcomes indicate that government's policy and subsidies positively and significantly shape firms' ethics and attitudes regarding SOT implementation in Tonga. Research limitations/implications-The research analyzes the SOT implementation in a single country of Tonga; thus, the findings cannot be generalized to other emerging countries. Besides, this study selects SMEs as the sample; hence, it cannot be used to explain the behaviors of large companies. ORIGINALITY-The research is the first attempt to assess such impacts in the SMEs of a South Pacific nation.
Full-text available
Article
We present a mobile apparatus for audio-visual experiments (MASAVE) that is easy to build with a low budget and which can run listening tests, pupillometry, and eye-tracking, e.g., for measuring listening effort and fatigue. The design goal was to keep the MASAVE at affordable costs and to enable shipping the preassembled system to the subjects for self-setup in home environments. Two experiments were conducted to validate the proposed system. In the first experiment we tested the reliability of speech perception data gathered using the MASAVE in a less controlled, rather noisy environment. Speech recognition thresholds (SRTs) were measured in a lobby versus a sound-attenuated boot. Results show that the data from both sites did not differ significantly and SRT measurements were possible even for speech levels as low as 40–45 dB SPL. The second experiment validated the usability of the preassembled system and the use of pupillometry measurements under conditions of darkness, which can be achieved by applying a textile cover over the MASAVE and the subject to block out light. The results suggest that the tested participants had no usability issues with setting up the system, that the temperature under the cover increased by several degrees only when the measurement duration was rather long, and that pupillometry measurements can be made with the proposed setup. Overall, the validations indicate that the MASAVE can serve as an alternative when lab testing is not possible, and to gather more data or to reach subject groups that are otherwise difficult to reach.
Full-text available
Article
The open and transparent documentation of scientific processes has been established as a core antecedent of free knowledge. This also holds for generating robust insights in the scope of research projects. To convince academic peers and the public, the research process must be understandable and retraceable ( reproducible ), and repeatable ( replicable ) by others, precluding the inclusion of fluke findings into the canon of insights. In this contribution, we outline what reproducibility and replicability (R&R) could mean in the scope of different disciplines and traditions of research and which significance R&R has for generating insights in these fields. We draw on projects conducted in the scope of the Wikimedia "Open Science Fellows Program" ( Fellowship Freies Wissen ), an interdisciplinary, long-running funding scheme for projects contributing to open research practices. We identify twelve implemented projects from different disciplines which primarily focused on R&R, and multiple additional projects also touching on R&R. From these projects, we identify patterns and synthesize them into a roadmap of how research projects can achieve R&R across different disciplines. We further outline the ground covered by these projects and propose ways forward.
Full-text available
Article
The prospect of automated scoring for interpreting fluency has prompted investigations into the predictability of human raters’ perceived fluency based on acoustically measured utterance fluency. Recently, Han, Chen, Fu and Fan (2020) correlated ten utterance fluency measures with raters’ perceived fluency ratings. To verify previous correlational patterns, the present study partially replicated Han et al. (2020) . Our analysis shows that most of the correlations observed in Han et al. (2020) were successfully replicated. To produce overall interim estimates of the true relationships, we conducted a mini meta-analysis of correlation coefficients reported in six relevant studies, informed by the “continuously cumulating meta-analysis” approach ( Braver et al. 2014 ). We found that phonation time ratio, mean length of run, and speech rate had relatively strong correlations with perceived fluency. We discuss these findings in light of automated fluency assessment and the need for replication and meta-analysis in translation and interpreting studies.
Article
Experiments are a central methodology in the social sciences. Scholars from every discipline regularly turn to experiments. Practitioners rely on experimental evidence in evaluating social programs, policies, and institutions. This book is about how to “think” about experiments. It argues that designing a good experiment is a slow moving process (given the host of considerations) which is counter to the current fast moving temptations available in the social sciences. The book includes discussion of the place of experiments in the social science process, the assumptions underlying different types of experiments, the validity of experiments, the application of different designs, how to arrive at experimental questions, the role of replications in experimental research, and the steps involved in designing and conducting “good” experiments. The goal is to ensure social science research remains driven by important substantive questions and fully exploits the potential of experiments in a thoughtful manner.
Full-text available
Article
The data includes measures collected for the two experiments reported in “False-Positive Psychology” [1] where listening to a randomly assigned song made people feel younger (Study 1) or actually be younger (Study 2). These data are useful because they illustrate inflations of false positive rates due to flexibility in data collection, analysis, and reporting of results. Data are useful for educational purposes.
Full-text available
Article
Recent controversies in psychology have spurred conversations about the nature and quality of psychological research. One topic receiving substantial attention is the role of replication in psychological science. Using the complete publication history of the 100 psychology journals with the highest 5-year impact factors, the current article provides an overview of replications in psychological research since 1900. This investigation revealed that roughly 1.6% of all psychology publications used the term replication in text. A more thorough analysis of 500 randomly selected articles revealed that only 68% of articles using the term replication were actual replications, resulting in an overall replication rate of 1.07%. Contrary to previous findings in other fields, this study found that the majority of replications in psychology journals reported similar findings to their original studies (i.e., they were successful replications). However, replications were significantly less likely to be successful when there was no overlap in authorship between the original and replicating articles. Moreover, despite numerous systemic biases, the rate at which replications are being published has increased in recent decades. © The Author(s) 2012.
Full-text available
Article
Replication-extension studies combine results from prior studies with results from a new study specifically designed to replicate and extend the results of the prior studies. Replication-extension studies have many advantages over the traditional single-study designs used in psychology: Formal assessments of replication can be obtained, effect sizes can be estimated with greater precision and generalizability, misleading findings from prior studies can be exposed, and moderator effects can be assessed.
Article
This practical guide on conducting power analyses using IBM SPSS was written for students and researchers with limited quantitative backgrounds. Readers will appreciate the coverage of topics that are not well described in competing books such as estimating effect sizes, power analyses for complex designs, detailed coverage of popular multiple regression and multi-factor ANOVA approaches, and power for multiple comparisons and simple effects. Practical issues such as how to increase power without increasing sample size, how to report findings, how to derive effect size expectations, and how to support null hypotheses, are also addressed. Unlike other texts, this book focuses on the statistical and methodological aspects of the analyses. Performing analyses using software applications rather than via complex hand calculations is demonstrated throughout. Ready-to-use IBM SPSS syntax for conducting analyses are included to perform calculations and power analyses at http://www.psypress.com/applied-power-analysis. Detailed annotations for each syntax protocol review the minor modifications necessary for researchers to adapt the syntax to their own analyses. As such, the text reviews both power analysis techniques and provides tools for conducting analyses. Numerous examples enhance accessibility by demonstrating specific issues that must be addressed at all stages of the power analysis and providing detailed interpretations of IBM SPSS output. Several examples address techniques for estimation of power and hand calculations as well. Chapter summaries and key statistics sections also aid in understanding the material. Chapter 1 reviews significance testing and introduces power. Chapters 2 through 9 cover power analysis strategies for a variety of common designs. Precision analysis for confidence intervals around mean difference, correlations, and effect sizes is the focus of chapter 10. The book concludes with a review of how to report power analyses, a review of freeware and commercial software for power analyses, and how to increase power without increasing sample size. Chapters focusing on simpler analyses such as t-tests present detailed formulae and calculation examples. Chapters focusing on more complex topics such as mixed model ANOVA/MANOVA present primarily computer-based analyses. Intended as a supplementary text for graduate-level research methods, experimental design, quasi-experimental methods, psychometrics, statistics, and/or advanced/multivariate statistics taught in the behavioral, social, biological, and medical sciences, researchers in these fields also appreciate this book's practical emphasis. A prerequisite of introductory statistics is recommended.
Article
A signature strength of science is that the evidence is reproducible. However, direct replications rarely appear in psychology journals because standard incentives emphasize novelty over verification (for background information, see Nosek, Spies, & Motyl, 2012). This special issue on “Replications of Important Results in Social Psychology” alters those incentives. We invite proposals for high-powered, direct replications of important results in social psychology. The review process will focus on the soundness of the design and the analysis, not on whether the outcome is positive or negative. (PsycINFO Database Record (c) 2013 APA, all rights reserved)
Article
G*Power (Erdfelder, Faul, & Buchner, 1996) was designed as a general stand-alone power analysis program for statistical tests commonly used in social and behavioral research. G*Power 3 is a major extension of, and improvement over, the previous versions. It runs on widely used computer platforms (i.e., Windows XP, Windows Vista, and Mac OS X 10.4) and covers many different statistical tests of the t, F, and chi2 test families. In addition, it includes power analyses for z tests and some exact tests. G*Power 3 provides improved effect size calculators and graphic options, supports both distribution-based and design-based input modes, and offers all types of power analyses in which users might be interested. Like its predecessors, G*Power 3 is free.
Article
In this article, the Society for Personality and Social Psychology (SPSP) Task Force on Publication and Research Practices offers a brief statistical primer and recommendations for improving the dependability of research. Recommendations for research practice include (a) describing and addressing the choice of N (sample size) and consequent issues of statistical power, (b) reporting effect sizes and 95% confidence intervals (CIs), (c) avoiding "questionable research practices" that can inflate the probability of Type I error, (d) making available research materials necessary to replicate reported results, (e) adhering to SPSP's data sharing policy, (f) encouraging publication of high-quality replication studies, and (g) maintaining flexibility and openness to alternative standards and methods. Recommendations for educational practice include (a) encouraging a culture of "getting it right," (b) teaching and encouraging transparency of data reporting, (c) improving methodological instruction, and (d) modeling sound science and supporting junior researchers who seek to "get it right."