Shall We Really Do It Again? The Powerful Concept of Replication Is Neglected in the Social Sciences

Article (PDF Available)inReview of General Psychology 13(2):90-100 · June 2009with 4,455 Reads
DOI: 10.1037/a0015108
Cite this publication
Abstract
Replication is one of the most important tools for the verification of facts within the empirical sciences. A detailed examination of the notion of replication reveals that there are many different meanings to this concept and the relevant procedures, but hardly any systematic literature. This paper analyzes the concept of replication from a theoretical point of view. It demonstrates that the theoretical demands are scarcely met in everyday work within the social sciences. Some demands are just not feasible, whereas others are constricted by restrictions relating to publication. A new classification scheme based on a functional approach that distinguishes between different types of replication is proposed. Next, it will be argued that replication addresses the important connection between existing and new knowledge. To do so it has to be applied explicitly and systematically. The paper ends with a description of procedures how this could be done and a set of recommendations how to handle the concept of replication in the future to exploit its potential to the full. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
1
This article is published as follows:
Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the
social sciences. Review of General Psychology, 13(2), 90100.
©American Psychological Association, 2009. This paper is not the copy of record and may not exactly replicate the authoritative
document published in the APA journal. Please do not copy or cite without author's permission. The final article is available at:
https://doi.org/10.1037/a0015108
Shall we really do it again? The powerful concept of
replication is neglected in the social sciences
Stefan Schmidt
Author Note
Stefan Schmidt, Institute of Environmental Medicine and Hospital Epidemiology, Freiburg
University Hospital, Germany.
Stefan Schmidt is funded by the Samueli Institute of Information Biology. I thank Harald
Walach, Oliver Lange, and Stephen Braude for comments and help in developing these ideas. I am
grateful to Deborah Lawrie for her assistance in improving the language of the manuscript.
Correspondence concerning this article should be addressed to Dr. Stefan Schmidt,
Evaluation Group for Complementary Medicine, Institute of Environmental Medicine and
Hospital Epidemiology, University Hospital Freiburg, Breisacherstr. 115b, D-79106 Freiburg,
Germany, telephone: +49-761-2708305, fax: +49-761-2708343, e-mail: stefan.schmidt@uniklinik-
freiburg.de.
2
Abstract
Replication is one of the most important tools for the verification of facts within the empirical
sciences. A detailed examination of the notion of replication reveals that there are many different
meanings to this concept and the relevant procedures, but hardly any systematic literature. This
paper analyzes the concept of replication from a theoretical point of view. It demonstrates that the
theoretical demands are scarcely met in everyday work within the social sciences. Some demands
are just not feasible, while others are constricted by restrictions relating to publication. A new
classification scheme based on a functional approach which distinguishes between different types
of replication is proposed. Next it will be argued that replication addresses the important
connection between existing and new knowledge. In order to do so it has to be applied explicitly
and systematically. The paper ends with a description of procedures how this could be done and a
set of recommendations how to handle the concept of replication in the future in order to exploit
its potential to the full.
Keywords: replication philosophy of science sociology of science functional approach
3
Shall we really do it again? The powerful concept of
replication is neglected in the social sciences
1 Introduction
Replication is one of the central issues in any empirical science. To confirm results or
hypotheses by a repetition procedure is at the basis of any scientific conception. A replication
experiment to demonstrate that the same findings can be obtained in any other place by any other
researcher is conceived as an operationalization of objectivity. It is the proof that the experiment
reflects knowledge that can be separated from the specific circumstances (such as time, place or
persons) under which it was gained.
As such, one would expect there to be a large body of literature on replication providing
clear-cut definitions on such matters as ‘what exactly is a replication experiment?’ or ‘what exactly is
a successful replication?’ Furthermore, one would expect to find guidelines on how to conduct a
replication or maybe some standard operating procedures on this issue. Each basic textbook not
only in psychology, but also in the natural sciences should have a chapter on replication; at least
the methodological ones. Also one could anticipate that there would be a decent body of literature
on replication by the philosophy of science.
The opposite is true. While replication is of major importance and is a highly respected
concept, it is rarely addressed in the literature and there are only a minor number of papers dealing
with the topic, and even fewer chapters in books or monographs. Within these papers replication is
often one issue among many, there is no clear-cut nomenclature, and the word replication is used as
4
a collective term to describe various meanings in different contexts. Furthermore, there is a
discrepancy between what is described as a replication from a theoretical point of view and what is
actually done in everyday science.
It is the scope of this paper to make a first start at bridging these gaps between importance
and reflection in the literature on the one hand and between theoretical views and practical
approaches on the other. In the following section replication is approached by looking at the basic
definitions and sketching a first rough classification. In section 3, the theory behind replication is
described from the perspective of the philosophy of science. This results in certain demands and
open questions. A functional approach towards replication is introduced to facilitate the
classification task in section 4. In section 5, the view changes towards the pragmatic aspects of
scientific work in its daily aspects. The divergence between theoretical demands and practical
approaches is highlighted and some strategies to get around this mismatch are identified. The
paper concludes with some recommendations on how to handle the concept of replication in the
future in order to exploit its potential to the full.
There are several limitations to this paper and I would like to make them clear from the
outset. While the repeatability criterion was originally developed within the physical sciences, this
paper will only deal with replication within the empirical social sciences. Natural sciences handle
replication differently, mainly within publication rules, and I will address these differences in
section 5.1. Another limitation is on statistical approaches. There are several methods of
comparing study results on a statistical basis to check for replication. These approaches will be
addressed in a separate paper. A third limitation is on the history of the replication concept. The
problem here is that its history is rather obscure. There is no classical experiment as such that was
first replicated by a prominent physicist in the early seventeenth century. Instead replication
5
emerged from the spread of knowledge by witnesses who reported on certain remarkable
experiments being performed in other places they had traveled to. According to Radder (personal
communication, July 1, 2002), the lack of any written history of replication may be due to the fact
“that in the natural sciences, in contrast to the social and human sciences, explicit and published
methodological studies are (mostly) lacking” (see also Schramm, 1998). A final limitation is on the
perspective of the sociology of science. There is, for instance, plenty of literature on how the need
for replication or the question of replicability are applied in scientific discourses about good and
bad science or in the controversies about unconventional claims. While this topic is addressed a
few times this paper does not claim to give a complete account of this aspect of replication.
2 Replication - Basic Definitions
Replication is one of the most obvious ingredients of science. But what does the term
actually denote? Shapin & Schaffer, (1985) describe replication as “(…) the set of technologies
which transforms what counts as belief into what counts as knowledge.” According to A Dictionary
of Social Sciences (Gould & Kolb, 1964, p. 748) replication is a scientific method to verify research
findings and this method refers to “(…) a repetition of a research procedure to check the accuracy
or truth of the findings reported.” This states that replication is a methodological tool based on a
repetition procedure that is involved in establishing a fact, truth or piece of knowledge.
Schweizer (1989) cites a set of replication definitions of various sources within the social
sciences. He states that definitions are very rare because the meaning of replication seems to be
obvious. Most of these definitions pronounce the action of repeating an experimental procedure. The
Dutch philosopher Hans Radder (1992, p. 64) takes a wider view on the notion of replication. In
his view the above mentioned reproduction of the (i) material realization is only one of three types of
replication with the other two being the (ii) reproducibility of the theoretical interpretation and the (iii)
6
replication of the experimental result by a different material procedure. An example for (ii) may be
Einstein’s hypothesis that ‘the speed of light cannot exceed a specified limit. This hypothesis can
be tested with at least two completely different experimental set-ups (see e.g. Collins & Pinch,
1993). As an example for (iii) Radder points to Salmon’s (1984, p. 217ff) description on the
determination of Avogadro’s number. Avogadro hypothesized that a mole of a substance always
contains the same fixed number of molecules. This number was independently verified by five
completely different experimental approaches (Brownian particle movement, decay of Alpha
particles, X-ray diffraction, blackbody radiation and electrolysis).
One can see from these two examples that there is a difference between (ii) and (iii), but
that it is rather small compared to (i). Therefore I suggest differentiating at a fundamental level
between two basic notions of replication:
1. Narrow bounded notion of replication:
Repetition of an experimental procedure. Henceforth, this notion will be termed direct
replication
2. Wider notion of replication:
Repetition of a test of a hypothesis or a result of earlier research work with different
methods. Henceforth, this will be referred to as a conceptual replication
As we will see later on, much of the confusion in the existing literature results from
mixing up these two basic categories.
Several authors have outlined various kinds of replication within these two broader
categories. Radder multiplies his three types of replication by introducing four different ranges of
people (e.g. original experimenter, lay person) who conduct the replication and arrives at 12 kinds
of replication. He himself states that some of these 12 kinds will probably never be conducted.
7
While this might be an interesting approach from a theoretical perspective it does not seem to be
very helpful for the description of ongoing scientific activities. Lykken (1968) distinguishes between
literal, operational and constructive replication. Literal and operational replication refer to the narrow
bounded notion with literal meaning, the most exact possible duplication of the procedures that
can be best conducted by the original investigator in the original lab. According to Lykken,
operational replication means, the repetition of the original procedures from the first author’s report.
Finally, constructive replication stands for any attempt to replicate a research finding with a different
procedure and is therefore linked to the wider notion of replication. Other authors have proposed
similar or slightly different conceptions such as exact, partial and conceptual replication (Hendrick,
1991), concrete and conceptual replication (Sargent, 1981), or exact and inexact replication (Keppel
1982). Other proposals can be found in Beloff (1985) and in Rao (1981). It will be shown later on
(Section 4.2) that a functional approach will be useful in sharpening and differentiating these
concepts.
3 Replication in the philosophy of science: the ideal perspective
3.1 Presuppositions and Functions of Replication
Why is replication such an important tool within science? And what are the basic
suppositions leading to the application of this tool?
The idea of replication or repetition is strongly associated with the assumption that nature
behaves lawfully. Dilworth (1996, p. 53ff) calls this idea The principle of uniformity of nature. This
principle (going back to Hume’s discussion of induction) is one of the most basic presuppositions
of science. It states,
8
(...) that natural change is lawful, or takes place according to rules. It thus implies a
deterministic conception of change, though this determinism need not be strict. (…)
Without the adoption of this principle, and an assumed awareness of some of the rules
according to which natural change takes place, there would be no basis for reasoned action
concerning the future, whether near or distant.
Karl Popper (1959, p. 45) links this presupposition to the method of gaining knowledge
about the world in repeating an experiment:
Only when certain events recur in accordance with rules or regularities, as in the case of
repeatable experiments, can our observation be tested in principle by anyone. (….) Only
by such repetitions can we convince ourselves that we are not dealing with a mere isolated
‘coincidence’, but with events which, on account of their regularity and reproducibility, are
in principle inter-subjectively testable.
This statement is in accordance with the definition laid down above, that replication is a
method of verification of a scientific finding by repeating a certain procedure. Therefore
replication has the function of establishing stability in our knowledge of nature (Radder, 1996).
Artefacts and chance findings are sorted out by repeated testing. But Radder also points to a
second function, which states that replication functions as a norm within science’s process of
establishing facts. True findings should be replicable; thus, any assertion that cannot be
demonstrated in a replication is not regarded as a scientific statement. Taking this into account it is
often said that replication is at the heart of (any) experimental science; that replication serves as “a
kind of demarcation criterion between science and non-science” (Braude, 1979, p. 42) or that
replication is “in a manner of speaking (…) the Supreme Court of the scientific system.” (Collins,
1985, p. 19). In conclusion, the first function (stability) differs between empirically supported
9
scientific claims and unsupported scientific claims, while the second function (norm) differs
between what is considered as a scientific claim and what is regarded as an unscientific claim.
But as well as the principle of uniformity or other presuppositions of modern science (see e.g.
Walach & Römer, 2000) the institution of replicability as a touchstone for objective knowledge is
an axiom. As such it is the result of a social interaction. This means that replication is important
simply because there is agreement among scientists that replication is important
1
. Furthermore this
norm is not strict. Replication may serve as a norm, but it does not necessarily have to do so.
Radder (1996, p. 27) asserts that in many cases replication is just taken for granted and this is, in
part, also true for psychology as will be shown in section 5.
3.2 The Procedure of Replication and its Limits
How should a replication be performed?
Popper (1959, p. 99) states that Any empirical scientific statement can be presented (by
describing experimental arrangements, etc.) in such a way, that anyone who has learned the
relevant technique can test it.” And Spencer-Brown (from ‘Discussion’ in Wolstenholme & Miller,
1956, p. 44) adds more precisely that
We want results which are not only consistent in one experiment, but which can be
observed to recur in further experiments. This does not necessarily mean that they must be
repeatable at will. A total eclipse of the Sun is not repeatable at will: nevertheless it is
demonstrably repeatable we can give the recipe for its repetition. And this is the
minimum we look for in science. We must be able to give a recipe.
Recipes are the things empirical scientists are looking for. So all this sounds very easy from
the ideal perspective. If there is any doubt about whether empiric data are a scientifically based or
10
not, you just have to replicate them. If replication succeeds the fact is established, if the replication
fails the initial empirical data are in doubt.
But every researcher who wants to prepare an empirical fact according to this recipe will
soon find themselves faced with two crucial questions:
1. What conditions have to be fulfilled to ensure that experiment B is a replication of
experiment A?
2. What conditions have to be fulfilled to ensure that the results of experiment B are a
successful replication of the results of experiment A?
While there are many textbooks on the philosophy of science or on empirical methods for
conducting research, none of them contains a clear-cut answer to these questions. Thus, the
question as to whether an experiment is a replication of another or whether the results of one
study replicate those of an earlier one cannot be answered clearly without ambiguity.
Furthermore, experiment B cannot ever be the exact replication of experiment A because
there is no such thing as an exact replication. The reproduced experimental procedure can be more
or less similar to the original one, but it cannot be the same in all possible aspects. If the second
experiment were identical to the first one in all aspects, then the original experiment A and its
replication B would be the same experiment. This is a contradiction to the very idea of replication,
which is finding the results of A again in a different experiment B.
This is even more true of all sciences working with what I would call irreversible units.
Irreversible units are complex systems that are not time-invariant. Therefore they accumulate
history during their existence and this is the reason why the same condition cannot be re-
established. Imagine an experiment on the physiological reactions of humans to a threatening
stimulus. This experiment can be repeated in the same lab with the same equipment, the same
11
participants by the same experimenter. But since some time has passed between the original study
and its replication the participants may have slightly changed their pattern of reactions to
threatening stimuli. This might be because of the experience of the experimental situation of the
original study, but it might also be because they have had other threatening experiences since then.
So although the same participants have been invited again, they are no longer the same people .
Rosenthal (1991, p. 2) therefore suggests terming any replication within the behavioral sciences as a
relative replication.
3.3 Sameness and Differences: Introducing the Functional Approach
There is no such thing as an exact replication. This is no bad thing, because nobody wants
an exact replication. An exact replication, even if it were possible, has no confirmatory power
(Collins, 1985). But it is exactly this confirmatory power which is the reason for conducting a
replication. This can be demonstrated in the following example.
Imagine, I were to show you a knife I have recently invented that cuts stone as easily as
butter. I demonstrate this several times (replication) to you by cutting pieces of stone. You might be
impressed by the demonstration but not really convinced that it works. However, you might be a
bit more convinced if I were to demonstrate it again on a different type of stone, and even more
convincing if I were to give you the knife and you (as a different person) were also able to cut one of
my stones. But there might still be a trick or something wrong with the material I am employing.
So you will be a lot more convinced if you could repeat the experiment in your home (different
place). But I think the most convincing strategy of all would be to give you a proper description of
how to produce such a tool, so you can manufacture your own different knife, completely
independently from what I have done.
12
With every difference that is introduced the confirmatory power of the replication increases.
Furthermore, the phenomenon generalizes to a larger area of application. That is why replications
have to be different from the original. So we do want both, sameness and differences between A, the
original experiment and its replication B. So what are replications aiming at?
Replications serve several different functions. The general function of replication is, as
mentioned above, to verify a fact or piece of knowledge. But this implies the following more
specific functions:
1. To control for sampling error (chance result),
2. to control for artefacts (lack of internal validity),
3. to control for fraud,
4. to generalize results to a larger or to a different population,
5. to verify the underlying hypothesis of the earlier experiment.
The design of a replication study is dependent on the function it is intended for. This will
be described in more detail in the following section.
4 A Functional Approach to Replication
4.1 Specification of the research reality
In order to assess, for each of these functions, what should be changed and what should be
kept constant in the design of a replication experiment it is necessary to describe and to categorize
the various aspects of a typical research situation in more detail. Hendrick (1991, p. 42ff) proposes
a description consisting of eight classes of variables that define the total research reality. The most
important class is termed primary information focus. This construct describes the instructions,
materials and events that create a certain stimulus complex for the participant. It is designed by the
13
researcher with respect to the hypothesis that is investigated and will be varied accordingly. It is
assumed that these variations will result in changes of the dependent variable(s). In other words the
primary information focus describes how the independent variable of the experiment is presented
to the participants. On a more detailed account the concept of the primary information focus consists
of two aspects. One is the immaterial information focus that is conveyed to the participants. The other
one is its material realization that is necessary to convey this information (for more details see
Hendrick 1991, p. 44-45). Six other classes describe the context in which this primary information is
embedded. These are (1) participant characteristics (e.g. gender), (2) specific research history of
participants (includes prior experiences and motivation for the participation in the experiment), (3)
cultural and historical context in which the study is embedded (4) general physical setting of the
research (e.g. light, aesthetics) (5) control agent (i.e. the experimenter who is interacting with the
participants) and (6) specific task variables (i.e. minute material circumstances such as typing font,
color of paper etc.). It is usually assumed that changes within a certain range (called the boundary
conditions) in these six classes of contextual variables do not interact with the variables which form
the primary information focus. This implies that the context has no impact on the dependent
variable although this assumption is quite often not maintained by the empirical findings. An
eighth class according to Hendrick comprises the modes of data reduction and presentation.
In order to apply this model of eight classes for an analysis of the various functions of
replication I would like to extend and modify it in two points. The first one includes the procedures
for the selection and allocation of the participants, the second one the procedures on how the dependent
variable is constituted. The latter one is similar to Hendrick’s modes of data reduction but includes
also the physical apparatus and the respective procedures during the experiment. If the six
14
contextual variables are subsumed under one headline we will arrive at a scheme with four major
classes necessary to describe the full research reality for an experimental study in the social sciences:
1. Primary information focus (consisting of immaterial informational aspects and its
material realization)
2. Contextual background of the experiment (consisting of six sub-classes)
3. Procedures for the selection and allocation of the participants
4. Procedures for the constitution of the dependent variable
4.2 Specific functions of Replication
With this scheme at hand the demands for each of the functions mentioned above can be
specified.
Function 1: To control for sampling error (chance result):
Any reported effect may be due to a type 1 error. This is targeting class 3 (participant
selection). But there is no possibility of ruling out a type 1 error completely by any changes in any
of the classes. The only possibility is to reduce its likelihood. If for instance the chances of
obtaining a false positive result are set to p = .05 (i.e. 1:20) then the probability of obtaining a
second false positive finding is much lower: p = .05 x .05 = .0025 (i.e. 1:400). Thus if one wants to
replicate a study to test for chance finding then the advice would be to repeat the experiment in all
classes 1,2,4 as exactly as possible but on a different sample (class 3).
2
But also the procedures in class
3 have to be kept constant as they describe the way in which a random sample is drawn from the
population. If this procedure is repeated then drawing a second random sample from the same
population will in almost all cases result in a sample that is different from the first one. Very often
this is done when the original researcher simply uses more participants from the same population.
15
Function 2: To control for artefacts (lack of internal validity)
The artefact hypothesis assumes that class 1 (primary information focus) is not solely
responsible for the changes in the dependent variables or in other words a lack of internal validity.
The reasons may be that either one or several variables from class 2 or 4 (or both) interact with one
or several variables from class 1 in an unexpected way. A replication testing for this assumption
should aim to duplicate the primary information focus as closely as possible while, at the same
time, as many variables as possible from class 2 and 4 should be changed. This is especially true for
the sub-classes of class 2 (i.e. contextual background) general physical setting, specific task variables and
control agent. Usually this is obtained when an experiment is reproduced in a different lab by a
different investigator. Such a replication is usually done when the findings of the original study are
in doubt without any specific hypothesis. Here differences in the results between original and
replicated study might not necessarily identify the source of the artifact because changes in the
contextual variables will be confounded. But of course if there are more specific hypotheses
regarding the source of the assumed artefact only the aspects and details dealing with these specific
circumstances have to be changed.
Function 3: To control for fraud
This function poses more or less the same demands as function 2. Here the specific
hypotheses target the personnel involved.
While the functions 1-3 aim at the confirmation or disproval of earlier reported results, the
next two functions reach beyond the scope of the original study.
16
Function 4: To generalize results to a larger or to a different population
In this case a researcher replicates an experiment to investigate whether the result obtained
on a sample from a specific population can be generalized to a larger or a different population. In
this case class 1 has to be kept constant and class 3 has to be changed. Class 2 and 4 should be kept
constant, but can also be slightly changed, which is often determined by pragmatic considerations.
If the same investigator/same lab is conducting the replication class 2 or 4 may be closest to the
original study. If the experiment is rebuilt in a different lab one cannot avoid changes in these
variables. All this is fine as long as none of them interacts with class 1.
3
Function 5: To verify the underlying hypothesis of the earlier experiment.
Stepping beyond the objectives of confirming results and generalizing to other samples a
simple repetition of the experimental procedure is no longer sufficient. To verify the underlying
hypothesis one needs to construct a different experimental setup that conveys the same primary
information focus (class 1) by a radically different material realization. This will result in large
changes of class 2 and 4 as well as changes of the material and procedural aspects of class 1. Class 3
should be kept constant if it is possible to run the new study on the same population. But this
might not be achievable if e.g. the new experimental idea targets a different population.
Function 5 is called a conceptual replication and will be illustrated by an example.
Rosenthals (Rosenthal & Rubin, 1978) research on experimenter effects started with studies on
animal learning. Students were told that some of the rats they had to train to run through a maze
were maze-bright while others were maze-dull. But this was just done to elicit the relevant
expectations in the students and to see whether these expectations can affect the rats’ performance
while all rats where from the same breed (Rosenthal & Fode, 1963). Some years later the same
17
hypothesis was tested with teachers and pupils. Here the teachers were told that based on a specific
test result some of their pupils would show remarkable gains in intellectual competence within the
next months. But this was also done only to elicit positive expectations in the teachers while the
pupils were selected randomly from the class. It is obvious that these two studies differ in all four
classes. But the immaterial information focus, (i.e. the information conveyed to the participants) is
the same for both studies. Both students and teachers were manipulated in a way to raise positive
expectations about some of the rats/pupils they had to teach. The material realization of this
information differed according to the experimental idea (experimental instructions read to the
students or test results about the pupils handed out to the teacher).
Insert Figure 1 about here
Figure 1 gives a graphical overview of the different functions and their demands for setting
up a replication which may serve as an orientation. In daily science the motivations for conducting
a replication study may be less distinct and pragmatic reasons may pose additional constraints. It
will not be possible always to keep every variable or class of variables constant, as it is preferable
according to the above scheme (e.g. old program does not run on the new computer, staff is no
longer available, the new lab. does not allow for the same sensory shielding etc.). Some contextual
subclasses are, in any case, beyond our influence. It is just important to keep in mind that with
every additional change introduced the interpretation of a failure to replicate will be less clear. If
several changes are made to the design than a failure to replicate cannot be attributed
unambiguously to one variable and the possibility of falsifying a hypothesis is lost. The transition
from the original experiment to the replication experiment is similar to a classic experiment, where
18
only one variable should be varied (i.e. independent variable) while all others have to be constant in
order to deduce a causal influence of the independent variable on the dependent.
Although it may be difficult to implement the above scheme on a one to one basis it is
important to clarify beforehand the following questions: Why do I want to replicate study A? What
are my specific hypotheses regarding study A? (E.g. specific hypotheses on interactions between
class 1 and class 2 or 4).
One can see from the graph that all five functions have one common feature: the primary
information focus (i.e. class 1) is kept constant. The introduction and precise description of this
valuable concept by Hendrick (1991) allows for a general definition of what constitutes a
replication and gives us an answer to the question of how we know whether experiment B is a
replication of experiment A. We could now say that B is a replication of A if A’s primary information
focus is reestablished in B.
Functions 1-4 refer to the narrow bounded notion of replication (i.e. reproducing the experimental
procedures) which was termed direct replication. On the opposite function 5 refers to a conceptual
replication.
A conceptual replication reaches further than a direct one. The successful replication of a
hypothesis validates this hypothesis but it also corroborates the theory behind it. In the end this
hypothesis has been tested by two different experimental ideas. Both ideas are derived from the
same underlying theory and thus this theory is also confirmed. While a direct replication is able to
produce facts a conceptual replication may produce understanding. This process of understanding
the underlying mechanism is according to Edge (1985) what is crucial for science
4
. Thus, the
function of conceptual replication is not only to confirm facts but also to assist in developing
models and theories of the world.
19
While all this sounds attractive Hendrick (1991, p. 46) points out that conceptual
replication is a high risk procedure. If such a replication is successful the benefit is great, but if it
fails the results are almost worthless. This is because it remains unclear whether the failure is due to
misconception in the new experimental set-up.
5
If the conceptual replication fails to show the same
results as in the original study, then this may be because the new material realization of the primary
information focus does not have the same effect on the participants as the original one. But
another reason may be that the original findings are not replicated because they were due to
artefacts, fraud, sampling error, etc. So whether the replication study or the original study is
responsible for the lack of replicability remains undecided.
5 Replication in daily social sciences: what is really happening?
Let us now turn to the question of how replication is handled in daily scientific work. Are
replications conducted? What kind of replication are they? Which function do they serve?
It was shown above that if one wants to transform a unique finding into a scientific fact, in
most cases, a replication is needed. Thus, there should be many studies published describing
replication attempts. Is this true? A separate examination will be made of direct and conceptual
replications.
5.1 Direct Replication
There are hardly any direct replication studies published within the social sciences (Collins,
1985, p.19; Mahoney, 1985). This is very obvious from just a short inspection of any relevant
journal. The closest thing one will find is a follow--up study. In a follow-up study (sometimes termed
an extension study) parts of an earlier study are directly replicated but then there is either a second
20
condition within this experiment or a second experiment that assesses a new hypothesis that was
not tested before.
Why have almost no direct replications been published? Maybe the most important reason
is that they are not acknowledged. Within the social sciences only the discovery of a new fact is
credited (see also Lindsay & Ehrenberg, 1993). Therefore replications are hard to publish. Why
publish something that is already known? may be a likely comment by a referee or an editor reviewing
a direct replication study. But as in today’s scientific world publication restraints have a large
impact on scientist’s decisions, anything that reduces the chances of getting published will be
avoided.
I would like to illustrate the impact of this implicit publication rule with a simple example.
My colleagues and I had conducted a set of studies aimed at eliciting placebo effects with
decaffeinated coffee and our results differed from what others had reported (Walach, Schmidt,
Dirhold & Nosch, 2002; Walach, Schmidt, Wiesch & Bihr, 2001). Some time later we were
writing up a proposal to the NIH for a set of similar experiments together with a Norwegian and an
American research group. I suggested replicating each of the planned experiments in one of the
other countries because of my prior experiences. My Norwegian colleague understood my concern
but replied that his PhD student had to publish at least four papers to graduate and that he didn’t
want to risk not getting a direct replication published. Of course I agreed to his suggestion of
performing a follow-up study on the Norwegian side instead. However, in the final proposal we
agreed to include some replication elements to verify earlier findings and to assure that the
proposed paradigm would work the way it was expected to. The proposal was turned down with
(amongst others) the following explanation: “The most serious concern is that some of the studies
21
are essentially replications of previous work and some are at least partially redundant with other
studies and unnecessary.” (Anonymous referee’s comment, August 14, 2002).
In a survey of 79 editors of social science journals, Neuliep & Crandall (1990) found a
strong editorial bias against publishing direct replications. Nearly 94% indicated “that replication
studies were not included as examples of research encouraged for submission in the editorial
policy…” (p. 87). Seventy-two percent preferred to publish a study claiming new findings rather
than a replication study.
6
In a similar survey of reviewers of social science journals (Neuliep &
Crandall, 1993) 54% of the reviewers indicated they would prefer a study with new findings over a
replication study6. When asked what the problems with publishing replications were, comments
such as “Not newsworthy” – “Waste of space” “… waste of resources” were given (see also
Greenwald, 1978).
While it is a very difficult task to publish a direct replication in the social sciences the
situation is different for the natural sciences. Madden et al. (1995) compared the attitude towards
replications between editors of journals from the social and the natural sciences. Whereas the
comments of the editors from the social sciences journals are similar to the ones reported by
Neuliep & Crandall (1990) the natural science journal editors show a more heterogeneous picture.
Here they found comments ranging from “Replication without some novelty is not acceptedup to
“Replication is rarely an issue for us … since we publish them”.
The latter comment may stem from the editor of a physics journal where replication is seen
as a standard procedure and part of the daily work (Hendrick, 1990, p. 42). One can see that the
rules governing editorial decisions vary between disciplines as pointed out by Mahoney (1985, p.
32). According to Zuckerman & Merton (1971, p. 76) the acceptance rate for a paper submitted to
a physics journal is 76% compared to an average rejection rate of 76% for the social sciences.
22
So whereas many scientists encourage each other to take the replication issue seriously, to
conduct and publish replications, it is perhaps not a good idea to follow this advice in the case of
direct replication studies within the social sciences. There are tacit procedures within the
community leading to exactly the opposite conclusions from those expected from the theoretical
perspective of the philosophy of science.
So does this mean that many facts are simply accepted without thorough testing through
replication? And furthermore does this imply that large amounts of the knowledge accumulated in
the social sciences might not survive an empirical test? I don’t think so. While some single facts
might be in doubt because they are produced as the result of erroneous procedures or chance
findings, the majority of our knowledge has been tested by conceptual replications and follow-up
studies.
5.2 Conceptual replication and follow-up studies
Conceptual replications test hypotheses or results from earlier research with a different
experimental set-up. A follow-up study combines a direct replication with new elements or with a
new experiment in the same publication. Both fulfil the call for new material that is requested by
the more or less explicit publication rules in the social sciences. And, indeed, hypotheses and
results of earlier publications are evaluated in succeeding studies (Hunt, 1975). This is a very
obvious fact that can be seen in almost every introductory section of any empirical paper. It is just
included in the daily and ordinary scientific process of expanding existing theories. Thus, according
to Hunt, there is a lively process of validating our knowledge through conceptual replications and
follow-up experiments but not all of these studies carry the label “replication”.
In the light of the unpopularity of the concept with reviewers and editors shown above, one
should not be surprised that replication is taking place in disguise. Daily science has found the way
23
to strike the necessary balance between confirmation and novelty (Weizsäcker, 1986), between
verifying facts and expanding knowledge to gain new understanding, but the main problem is that
this process is not explicit. With this lack of explicitness comes along a lack of a systematic
approach. Most researchers are not aware that they are running a replication study if they set up an
experiment that has been done before. And of course they are not aware of the different functions
of various kinds of replications. They introduce changes in several classes of variables at the same
time and end up in a situation where their failure to replicate the original results is almost
impossible to interpret. They often choose a post hoc solution to define whether an experiment is a
replication of an earlier experiment or not (see also section 5.5). But a cumulative science should
be built on its foundations in a systematic way. Adding a brick here and another brick there
without much regard for the space between them may result in an unstable building with weak
parts, leakages and unnecessary parts which will require a major effort later on to effect their
removal.
Replication addresses precisely this connection between existing and new knowledge. It is
needed to integrate the research results of different laboratories and different researchers into a
coherent theoretical construct. And this is the reason why replication has to be applied explicitly
and systematically. How this can be done will be illustrated in the following section.
5.3 Systematic and Explicit Replication Procedures
Follow-up study: The simplest way to apply the concept of replication in order to extend and
generalize our understanding is the above described follow--up study. In such a study one
experimental condition should be a direct replication of an earlier experiment. This condition
should be called explicitly replication condition and has the function of demonstrating that the same
results as shown in the original study can be obtained with the new setup. Next, one or more other
24
experimental conditions can be added, which have either the function of generalizing over persons,
treatments or outcome measures or of testing new hypotheses. It is important that within this new
condition only one aspect is changed compared to the replication condition.
Systematic replications: This concept was suggested by Hendrick (1990, p. 48f) based on his
model of 8 classes of variables for the complete description of research reality which was outlined
in section 4.1. Similarly as in a follow-up study he suggests firstly conducting a direct replication in
order to receive what he calls an “anchor cell”. Next he suggests varying the primary information
context to specify a range which results in no distortion of the critical information structure, i.e.
gives the same results. Then he suggests specifying the range of variation which will result in a
distortion and thus produces different results. In a next step it is suggested that the contextual
variables should be varied. This procedure will result in a data matrix which is designed according
to Campbell and Fiske’s (1959) multitrait-multimethod concept of research. The above described
follow-up study can be considered as one special case within this matrix. Such a systematic
replication matrix forms an ideal standard which will rarely be met with in reality. Shortages of
resources and time constraints mostly result in incomplete data matrices and scattered results. Also
within several research areas (e.g field studies in schools, evaluation of certain aspects of hospitals)
it is impossible to hold all variables constant or even to conduct a direct replication. But this, of
course, does not mean that no understanding and generalization of knowledge is taking place. In
these research contexts another concept from the Campbell tradition is relevant:
The principle of heterogeneity of irrelevancies: This principle addresses explicitly those replication
studies where more than one aspect has been changed. If these studies still replicate the original
findings then the aspects which have undergone changes can be judged as irrelevant to the primary
information focus of the concept under investigation. Thus it is suggested that many aspects which
25
are considered irrelevant should be varied in order to achieve a heterogeneity of irrelevancies which
then in turn allows for generalisation over these aspects (Cook, 1990; Shadish, 1995). If several
studies with a large heterogeneity of irrelevancies replicate each other one can deduce a more
complete understanding of the phenomenon under consideration than on the basis of single
studies. This holds true although the systematic replication matrix described above has many empty
cells. However if the studies do not replicate then the source of the failure is difficult to determine
and one gains only a very limited understanding.
5.4 Unconventional Claims and Interesting Theories
Regarding the value assigned to direct replications and the explicit mentioning of them in
publications there is an exception to the above stated circumstances which is worth considering in
a little more detail. These are unconventional claims often derived from theories which are
considered as interesting. Davis (1971) gives a detailed index of what is regarded as an interesting
theory in science. The punch line is that “interesting theories deny certain assumptions of their
audience (…)” (p. 309). An example of this are claims often made by parapsychologists, e.g. an
empirical evidence for telepathy (Bem & Honorton, 1994). Such claims may pose a contradiction
to the current scientific world view and may therefore conflict with accepted understanding. Some
of them, such as precognition, may even challenge the basic unproven presuppositions of science
(Walach & Schmidt, 2005). While any scientific finding fitting the current paradigm will be
accepted on the basis that it affirms certain assumptions of the audience (without any call for
replication) there will be a different reaction to all claims threatening the generally agreed scientific
world view. In these cases the immediate call for a direct replication of the phenomenon under
debate will likely be the first reaction of the scientific community (other reactions may include
ignoring the results if they are presented by researchers not working within the accepted
26
mainstream scientific community or to question the credibility of the researchers if they are from
mainstream science). These direct replications are more likely to get published than direct
replications of conventional claims (see e.g. Colwell, Schröder & Sladen, 2000; Wiseman, Smith &
Milton, 1998).
However, in this case of unconventional claims there are two interesting observations. One
is that hardly any mainstream scientists takes the challenge to conduct a direct replication of an
unconventional claim, although everybody involved in the debate ask to do so. The second
observation is that if there are direct replications conducted within unconventional claims, then
mainstream journals are more likely to accept those papers that could not replicate the controversial
claim. This is in contrast to replication procedures within the mainstream, where a direct
replication (no new procedure) with non-significant findings (no new results) is the most unlikely
paper to get accepted at all. Such a publication bias may have the function of dismissing claims that
are in conflict with the scientific world view agreed upon.
5.5 Is it a replication? The rhetorical solution
In section 3 it was shown that so far no clear-cut criteria exist to determine whether
experiment B is a direct or conceptual replication of an earlier experiment A. In section 4 an
answer to this question was suggested based on the concept of primary information focus.
However, in daily science there is often a pragmatic solution to this question. The question
whether B is a replication of A is often decided after the study has already been conducted on the
basis of the outcome (Braude, 1979). This is done to maintain a certain amount of consistency on
the part of the researcher who has conducted the replication.
Imagine, for instance, that a research group has conducted a replication and was able to
reproduce the results of the original study. The discussion of the paper will very likely start with a
27
sentence like: “The data we presented demonstrate a successful replication of the findings by ...”.
The experiment is declared a replication of the original study. But in the opposite case where the
original findings were not replicated the researcher will immediately look for differences between
his or her study and the original. And of course there will be differences because, as was shown
above, there are always differences between two studies. It is thus very likely that these differences
will be made responsible for the failure of the replication and this interpretation entails the
statement that the experiment was not a replication but a different study.
So while there is so far no clear-cut criterion to decide whether a study is a replication of
another study researchers in daily science often use rhetorical post hoc arguments instead (Braude,
1979) and these statements do not necessarily reflect objective empirical knowledge.
7
However, if one avoids this rhetorical solution and decides on an a priori basis that
experiment B is a replication of A , then the next question is whether the results from B replicate
those of A, or, in other words: Is B a successful replication of A? This question can be answered in
a clear-cut way by applying statistical methods that are described elsewhere (e.g. Rosenthal, 1991).
5.6 Recipe - Tacit knowledge
A crucial issue in the discussion on what exactly qualifies as a direct replication is how to set
up the replication experiment. According to Popper one just needs a “description of experimental
arrangements” or, according to Spencer-Brown a “recipe” (see above) and the experiment can be set
up accordingly. Any proper publication with a precise Method section can fulfill this requirement.
It will contain clear-cut descriptions of the applied amplifiers, stimulus materials and software etc.
that are necessary for the study.
It is the merit of H. M. Collins to demonstrate that this idea may not be achieved within
daily science, that publications will not necessarily provide the required information and that the
28
idea of setting up replication experiments with a recipe is valid in general. These conclusions follow
from Collins investigation (Collins, 1985) of the replication attempts on the TEA-laser (TEA-laser
stands for Transversely Excited Atmospheric pressure CO2 laser).
The first laser of this kind was built in 1968 by a Canadian research group. At first the
results were classified but later announced in 1970. After this announcement several British and
North American laboratories tried to build copies of this device. Collins visited 11 of them, spoke
to the researchers and evaluated their success. He found that
“no scientist succeeded in building a laser using only information found in published or
other written sources.” (Collins, 1985, p. 55).
“no scientist succeeded in building a TEA-laser where their informant was a ‘middle man’
who has not built a device himself.” (p. 55)
“(…) even where the informant had built a successful device, (…), the learner would be
unlikely to succeed without some extended period of contact with the informant and, in
some cases, would not succeed at all.” (p. 55)
What is the reason for this failure? It has something to do with the fact that not all
procedures involved in conducting an experiment can be verbalized. Moreover, not all of them are
identified as necessary for a success of the experiment. This is not only true for all required motor
skills but also for tacit knowledge. Here tacit knowledge refers to “the knowledge that you need to
succeed to in an endeavor, that is not formally taught, and that often is not even verbalized”
(Sternberg, 1995, p. 321). It results from implicit learning, has a pre-verbal or non-verbal character
(Dowd, 2000) and is related to the procedural memory (Schacter, 2000). Thus it is difficult to
verbalize and responsible for the poor correlation often observed between an expert’s verbal report
29
and her or his behavior or mental performance (Chervinskaya & Wasserman, 2000). Furthermore,
compared to formal knowledge, tacit knowledge is highly context dependent.
This is why it may be necessary to learn how to conduct a difficult sensitive experiment in
the presence of an experienced successful experimenter in order to get the same results. Of course
Collin’s example is from the natural sciences, but analogous processes can be assumed within the
social sciences. One cannot learn how to perform a 64-channel EEG by a textbook or how to
conduct a sensitive experiment on experimenter effects by the description of the Methods section.
The problem resulting from this fact is obvious. If one wants to conduct a direct replication
to control for artefacts one should set up an experiment that differs in some of its contextual
variables from the original study. On the other hand, it might be necessary to learn how to proceed
in this experiment in a one-on-one situation with the original investigator and this might result in a
(tacit) transmission of procedures resulting in the same artefact findings as in the original study.
6 Conclusions
It could be shown that the notion of replication has several meanings and is a very
ambiguous term. From the perspective of the philosophy of science replication is at ‘the heart of
any science’, but from the perspective of the sociology of science acknowledgement is very low. The
demands of the theoretical approach have been revealed as non-achievable within daily practice.
Publication constraints within the social sciences lead to only rare occurrences of direct
replications. Conceptual replications can be encountered occasionally but they are often not
explicitly so termed and lack a systematic approach.
As a result of this analysis a functional approach to replication was developed in order to
clearly differentiate between different types of replication and their implications for designing a
replication study. Furthermore I proposed a solution to the question whether a study B can be
30
regarded as being a replication of study A. Based on these analyses I would like to give a few
recommendations on how to deal with the replication issue in the future in order to arrive at a
more precise and fruitful application of this multiple concept.
My first suggestion refers to the early stages of study planning. If one decides to conduct a
replication experiment one should consider why. What is the reason for replicating this study? Do I
have a hypothesis that the original study only demonstrates a chance finding or that it is due to an
artefact or maybe even fraud? Or am I interested in the hypothesis and underlying theory of the
study and is my goal to further develop this approach? All these usually implicit hypotheses
demand different experimental approaches and setups (see figure 1 for an overview) and it is
recommended to make them explicit as early as possible.
Furthermore I suggest pre-specifying the specific replication procedure and its function
including a criterion for a successful replication (as well as all other study specifications) in a study
protocol. In the near future such protocols may be registered in a public database as this is, for
example, already compulsory for a publication in the major medical journals (De Angelis et al.,
2004). This leads to a clear-cut evaluation of the aim of the replication experiment and avoids
hypothesizing after the results are known as well as rhetorical post-hoc decisions as described in
section 5.5.
While it is reasonable from a theoretical point of view to conduct direct replications one
cannot really recommend doing so to scientists who are dependent on publishing their results.
Maybe our thinking on the issue of publishing direct replications will change within the next
decades in the same way as it is changing with regard to publishing non-significant results.
However for today it may be more appropriate to include replication elements in a study
that also contains some new elements. How this could be done was described in detail in Section
31
5.3. Such a systematic connection between the replication condition and the new hypothesis
condition is the important link that demonstrates that the underlying assumptions are working and
that they can be transferred to a similar design testing for new ideas. Furthermore, publishing
experiments combining replication and new elements in the same setting in only one publication is
in accordance with the APA publication manual on piecemeal reporting (American Psychological
Association, 2001, p. 352-353), who explicitly recommend doing so.
It was said that such combined studies are often conducted but it is only rarely that
replication procedures are openly so named. I strongly recommend addressing replication explicitly
wherever it takes place. The full potential of this powerful concept can only be used if it is named,
recognized and addressed systematically. So far replication marks a blind spot in the social sciences’
tool box. In order to change this situation we have to discuss replication extensively. We should
reflect on replication aspects in our own research with every experiment we set up. Furthermore we
need to add this topic to our textbooks and to teach it to our students. And finally we have to
discuss the editorial policies of our journals and the handling of the replication issue in the review
process. Hopefully this paper may serve as a starting point for a fruitful discussion of this approach.
32
Reference List
American Psychological Association. (2001). Publication Manual of the American Psychological
Association (5th ed.). Washington, DC: American Psychological Association.
Beloff, J. (1985). Research strategies for dealing with unstable phenomena. In: B. Shapin & L. Coly
The repeatability problem in parapsychology (pp. 1-14). New York: The Parapsychology
Foundation.
Bem, D. J. & Honorton, C. (1994). Does PSI exist? Replicable evidence for an anomalous process
of information transfer. Psychological Bulletin, 115, 4-18.
Braude, S. E. (1979). ESP and Psychokinesis. A Philosophical Examination. Philadelphia: Temple
University Press.
Campbell, D. T. & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-
mulitmethod matrix. Psychological Bulletin, 56, 81-105.
Chervinskaya, K. R. & Wasserman, E. L. (2000). Some methodological aspects of tacit knowledge
elicitation. Journal of Experimental & Theoretical Artificial Intelligence, 12, 43-55.
Collins, H. & Pinch, T. (1993). The Golem. What Everyone Should Know About Science. Cambridge:
Cambridge University Press.
Collins, H. M. (1985). Changing order. London: Sage.
Colwell, J., Schröder, S. & Sladen, D. (2000). The ability to detect unseen staring: A literature
review and empirical tests. British Journal of Psychology, 91, 71-85.
Cook, T. D. (1990). Rersearch Methodology: Strengthening Causal Interpretations of
Nonexperimental Data. In: L. Sechrest, E. Perrin & J. Bunker (pp. 9-30). US Department
of health and human services, Public Health Service, Agency for Health Care Policy and
Research.
Davis, B. M. (1971). That's interesting! Towards a Phenomenology of Sociology and a Sociology of
Phenomenology. Philosophy of the Social Sciences, 1, 309-344.
De Angelis, C., Drazen, J. M., Frizelle, F. A., Haug, C., Hoey, J., Horton, R., Kotzin, S., Laine, C.,
Marusic, A., Overbeke, A. J., Schroeder, T. V., Sox, H. C., Van Der Weyden, M. B. &
International Committee of Medical Journal Editors. (2004). Clinical trial registration: a
33
statement from the International Committee of Medical Journal Editors. New England
Journal of Medicine, 351, 1250-1251.
Dilworth, C. (1996). The Metaphysics of Science. An Account of Modern Science in terms of Principles,
Laws and Theories. Dordrecht: Kluwer.
Dowd, E. T. (2000). Memory process in psychotherapy: Implictations for integration. (Unpublished).
Edge, H. (1985). The problem is not replication. In: B. Shapin & L. Coly The repeatability problem in
parapsychology (pp. 53-64). New York: The Parapsychology Foundation.
Gould, J. & Kolb, W. L. (Eds.). (1964). A dictionary of the social sciences. London: Tavistock
Publications.
Greenwald, A. G. (1978). Consequences of Prejudice Against the Null Hypothesis. Psychological
Bulletin, 82, 1-20.
Hendrick, C. (1991). Replication, strict replications, and conceptual replications: Are they
important? In: J. W. Neuliep Replication research in the social sciences (pp. 41-49). Newbury
Park: Sage.
Hunt, K. (1975). Do we really need more replications? Psychological Reports, 36, 587-593.
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social
Psychology Review, 2, 196-217.
Lindsay, R. M. & Ehrenberg, A. S. C. (1993). the design of replicated studies. The American
Statistician, 47, 217-228.
Lykken, D. T. (1968). Statistical significance in psychological research. Psychological Bulletin, 70, 151-
159.
Madden, C. S., Easley, R. W. & Dunn, M. G. (1995). How Journal Editors View Replication
Research. Journal of Advertising, 24, 78-87.
Mahoney, M. J. (1985). Open exchange and epistemic process. American Psychologist, 40, 29-39.
Neuliep, J. W. & Crandall, R. (1990). Editorial Bias Against Replication Research. Journal of Social
Behavior and Personality, 5, 85-90.
34
Neuliep, J. W. & Crandall, R. (1993). Reviewer Bias Against Replication Research. Journal of Social
Behavior and Personality, 8, 21-29.
Popper, K. R. (1959). The logic of scientific discovery. London: Hutchinson.
Radder, H. (1992). Experimental reproducibility and the experimenters' regress. In: D. Hull, M.
Forbes & K. Okruhlik Proceedings of the 1992 biennal meeting of the philosophy of science
association (pp. 63-73). East Lansing, MI: Philosophy of Science Association.
Radder, H. (1996). In and about the world. Philosophical studies of science and technology. Albany, N.Y.:
State University of New York Press.
Rao, K. R. (1981). On the question of replication. Journal of Parapsychology, 45, 311-320.
Rosenthal, R. (1991). Replication in behavioral research. In: J. W. Neuliep Replication research in the
social sciences (pp. 1-39). Newbury Park: Sage.
Rosenthal, R. & Fode, K. L. (1963). The effect of experimenter bias on the performance of the
albino rat. Behavioral Science, 8, 183-189.
Rosenthal, R. & Rubin, D. B. (1978). Interpersonal expectancy effects: The first 345 studies.
Behavioral and Brain Sciences, 1, 377-415.
Salmon, W. C. (1984). Scientific explanation and the causal structure of the world. Princeton, NJ:
Princeton University Press.
Sargent, C. L. (1981). The repeatability of significance and the signifcance of repeatability. European
Journal of Parapsychology, 3, 423-433.
Schacter, D. L. (2000). Wir sind Erinnerung. Gedächtnis und Persönlichkeit. Reinbek bei Hamburg:
Rowohlt. (Original erschienen 1996: Searching for Memory. The Brain, the Mind, and the
Past).
Schramm, M. (1998). Experiment im Altertum und Mittelalter. In: M. Heidelberger & F. Steinle
(Hrsg.), Experimental Essays - Versuche zum Experiment (pp. 34-67). Baden-Baden: Nomos
Verlagsgesellschaft.
Schweizer, K. (1989). Eine Analyse der Konzepte, Bedingungen und Zielsetzungen von
Replikationen. Archiv für Psychologie, 141, 85-97.
Shadish, W. R. (1995). The logic of generalization: Five principles commot ot experiments and
ethnograpies. American Journal of Community Psychology, 23, 419-428.
35
Shapin, S. & Schaffer, S. (1985). Leviathan and the air-pump. Hobbes, Boyle, and the experimental life.
Princeton, NJ: Princeton University Press.
Sternberg, R. J. (1995). Theory and measurement of tacit knowledge as a part of practical
intelligence. Zeitschrift für Psychologie, 203, 319-334.
Walach, H. & Römer, H. (2000). Complementarity is a useful concept for consciousness studies. A
Reminder. Neuroendocrinology Letters, 21, 221-232.
Walach, H. & Schmidt, S. (2005). Repairing Plato's Life Boat with Ockham's Razor. The
Important Function of Research in Anomalies for Mainstream Science. Journal of
Consciousness Studies, 12, 52-70.
Walach, H., Schmidt, S., Dirhold, T. & Nosch, S. (2002). The Effects of a caffeine placebo and
suggestion on blood pressure, heart rate, wellbeing and cognitive performance: Analysis of
difficulties with reproducing. International Journal of Psychophysiology, 43, 247-260.
Walach, H., Schmidt, S., Wiesch, S. & Bihr, N. (2001). The Effects of Subject and Experimenter
Expectation on Blood Pressure, Heart Rate, Wellbeing and Cognitve Performance: A
Failure to Reproduce. European Psychologist, 6, 15-25.
Weizsäcker, E. U. v. (1986). Erstmaligkeit und Bestätigung als Komponenten der Pragmatischen
Information. In: E. U. v. Weizsäcker (Hrsg.), Offene Systeme I: Beiträge zur Zeitstruktur von
Information, Entropie und Evolution (2., überarbeitete Auflage). (pp. 82-113). Stuttgart: Klett-
Cotta.
Wiseman, R., Smith, M. & Milton, J. (1998). Can animals detect when their owners are returning
home? An experimental test of the 'psychic pet' phenomenon. British Journal of Psychology,
89, 453-462.
Wolstenholme, G. E. & Miller, E. C. (Eds.). (1956). CIBA Foundation symposium on extrasensory
perception. London: Churchill.
Zuckerman, H. & Merton, R. K. (1971). Patterns of evaluation in science: Institutionalisation,
structure and functions of the referee system. Minerva, 9, 66-100.
36
Figure Caption
Figure 1: Schematic description of the various functions of replications and the according demands
for changes and constants in the four classes of variables describing a research situation. Changes =
classes of variables which should be at least partially changed in order to run this type of
replication, const. = these variables should be kept constant.
37
class
control for
sampling error
control for artifact
control for fraud
verify hypothesis
1 1
Primary information
focus (immaterial)
const.
const.
const.
const.
Primary inform. focus
(material realization)
const.
const.
const.
changes
2
Contextual background
const.
changes§
changes+
changes
3
Selection of participants
const.*
const
const.
const.&
4
constitution of
dependent variable
const.
changes§
const.
changes
type of replication
direct replication
conceptual
replication
scientific gain
fact confirmation
understanding
risk
low risk
high risk
* this means to apply the same procedures to select participants from the population which will result in a different
sample.
§ changes here refer to build up the same set-up with different material and equipment. If there is a specific hypothesis
regarding the alleged artifact available, then this hypothesis will make clear which aspects of class 2 and/or 4 have to
be changed.
+ changes here refer to the personnel involved in the study
& it depends on new experimental realization whether the participants can be drawn from the same population
38
Footnotes
1
Of course there are several good reasons for this agreement but they are nevertheless debatable
and not fixed.
2
The sample should be different because the idea of a type I error relies on the assumption that
by chance a biased sample was drawn from the population or that by chance the
randomization produced significant baseline differences.
3
Function 4 has quite similar demands in terms of changing and keeping constant instances of
the several classes than function 1. The crucial difference is that for function 1 a new sample
should be drawn from the same population and for function 4 from a different one.
4
He cites as an example the Babylonians who could predict the daily position of the planets by
mathematical procedures, but these predictions failed to be a science because the motions of
the planets were not intelligible to them.
5
As there are no two identical experiments this problem also applies to direct replications,
although the possible sources of the failure to replicated a far more limited in this case.
39
6
Next to tick one of the two options they had also the possibility to tick both.
7
Such post-hoc reasoning is not only limited to the question whether a study is a replication or
not. (1991, p. 46; Kerr, 1998) introduces the term “hypothesizing after the results are known
(HARK)” for cases where a post-hoc hypothesis, based on the results of a study, is presented as
an a priori hypothesis in the same study.
  • Article
    Full-text available
    The purpose of this study was to determine if the associations between eating competence (EC) and eating behaviors that were found in a USA sample of predominantly Hispanic parents of 4th grade youth could be replicated in a USA sample of predominantly non-Hispanic white parents of 4th graders. Baseline responses from parents (n = 424; 94% white) of youth participating in a year-long educational intervention were collected using an online survey. Validated measures included the Satter Eating Competence Inventory (ecSI 2.0TM), in-home fruit/vegetable (FV) availability, healthful eating behavior modeling, and FV self-efficacy/outcome expectancies (SE/OE). Data were analyzed with general linear modeling and cluster analyses. The findings replicated those from the primarily Hispanic sample. Of the 408 completing all ecSI 2.0TM items, 86% were female, 65% had a 4-year degree or higher, and 53% were EC (ecSI 2.0TM score ≥ 32). Compared with non-EC parents, EC modeled more healthful eating, higher FV SE/OE, and more in-home FV availability. Behaviors clustered into those striving toward more healthful practices (strivers; n = 151) and those achieving them (thrivers; n = 255). Striver ecSI 2.0TM scores were lower than those of thrivers (29.6 ± 7.8 vs. 33.7 ± 7.6; p < 0.001). More EC parents demonstrated eating behaviors associated with childhood obesity prevention than non-EC parents, encouraging education that fosters parent EC, especially in tandem with youth nutrition education.
  • Article
    Full-text available
    Assessing job applicants' general mental ability online poses psychometric challenges due to the necessity of having brief but accurate tests. Recent research (Myszkowski & Storme, 2018) suggests that recovering distractor information through Nested Logit Models (NLM; Suh & Bolt, 2010) increases the reliability of ability estimates in reasoning matrix-type tests. In the present research, we extended this result to a different context (online intelligence testing for recruitment) and in a larger sample (N = 2949 job applicants). We found that the NLMs outperformed the Nominal Response Model (Bock, 1970) and provided significant reliability gains compared with their binary logistic counterparts. In line with previous research, the gain in reliability was especially obtained at low ability levels. Implications and practical recommendations are discussed.
  • Article
    Full-text available
    The current crisis of confidence in psychological science has spurred on field-wide reforms to enhance transparency, reproducibility, and replicability. To solidify these reforms within the scientific community, student courses on open science practices are essential. Here we describe the content of our Research Master course “Good Research Practices” which we have designed and taught at the University of Amsterdam. Supported by Chambers’ recent book The 7 Deadly Sins of Psychology, the course covered topics such as QRPs, the importance of direct and conceptual replication studies, preregistration, and the public sharing of data, code, and analysis plans. We adopted a pedagogical approach that: (a) reduced teacher-centered lectures to a minimum; (b) emphasized practical training on open science practices; and (c) encouraged students to engage in the ongoing discussions in the open science community on social media platforms.
  • Article
    Theories suggest that the perception of others’ actions and social cues leads to selective processing of object features. Most recently, natural pedagogy theory postulated that ostensive cues lead to a selective processing of an object's features at the expense of processing of it's location. This study examined this hypothesis in ten-year-old children with and without autism spectrum condition (ASC) to better understand social information processing in ASC and the relevance of observing others in human object processing in general. Participants saw an agent either ostensively pointing to an object or non-ostensively grasping an object. Thereafter, the cued or uncued object changed either its location or identity. We assessed not only behavioral responses, but also participants’ gaze behavior by means of eye-tracking. In contrast to natural pedagogy theory, we found that in the non-ostensive grasping context, participants rather noticed an identity change than a location change. Moreover, location changes were more readily identified in the ostensive pointing context. Importantly, there was no difference between children with and without ASC. Our study shows that the perception of ostensively vs. non-ostensively framed actions leads to different processing of object features, indicating a close link between action perception, object processing and social cues. Moreover, the lacking group difference in our study suggests that these basic perception-action processes are not impaired in autism.
  • Article
    We recently demonstrated a processing advantage of social versus nonsocial feedback stimuli in a western sample by assessing phase-locked neural responses. The current study extended our previous findings to another cultural sample (Chinese) to further test whether non-phase-locked neural oscillations also exhibit the social feedback processing advantage. Fifty-three Chinese volunteers performed a time estimation task with social and nonsocial feedback stimuli (matched for complexity) while electroencephalogram was recorded. Almost entirely replicating our previous results, feedback ERPs showed a processing advantage for social compared with nonsocial stimuli. Importantly, non-phase-locked oscillations also revealed this pattern. Frontal midline theta (FMΘ) oscillations differentiated between negative and positive feedback to a larger extent in response to social compared with nonsocial feedback. The current findings imply a rather universal effect of social stimulus characteristics during feedback processing and further corroborate the notion of social content as a distinct stimulus category.
  • Article
    Full-text available
    Bevor wissenschaftliche Beiträge in Fachzeitschriften publiziert oder auf Tagungen präsentiert werden, überprüfen Herausgeber*innen bzw. Organisator*innen die Qualität der Einreichungen. Dies geschieht zumeist im Peer-Review-Verfahren, bei dem unabhängige Kolleg*innen aus dem gleichen Forschungsgebiet die Einreichung begutachten. Die vorliegende Studie hinterfragt, wie zuverlässig das Review-Verfahren ist. Dazu untersuchen wir die Bewertungen der Einreichungen von DGPuK-Jahrestagungen und der Tagungen der fünf größten Fachgruppen über einen Zeitraum von fünf Jahren. Basierend auf 3537 Reviews von 23 Tagungen analysieren wir Interrater-Reliabilitäten (Krippendorffs á und Brennan und Predigers ê) und Spannweiten über verschiedene Einzelkriterien (Passung, Originalität, Relevanz, Theorie, Methode und Darstellung) und Gesamturteile; zudem fokussieren wir Ursachen von Dissens bzw. Konsens. Die Studie zeigt, dass unter Gutachter*innen durchaus Uneinigkeit besteht: Dies betrifft sowohl die Gesamtwertung als auch alle Einzelwertungskategorien. Die Bildung von Durchschnitten über mehrere Kriterien hinweg erhöht jedoch die Übereinstimmung der Urteile. Abschließend diskutieren wir Ideen, um Begutachtungsverfahren zukünftig zu verbessern.
  • Article
    There are positive feedback loops between students’ grades and emotions. However, subjective appraisals, not grades, are theorized to trigger emotions. We extended previous research by comparing the effects of objective score and subjective appraisals of the score (i.e., satisfaction) on emotions. We used an ecologically-valid quasi-experimental design and found differences in how objective score compared to satisfaction impacted emotions. Main effects for score showed positive associations with hope, pride, relief, and negative associations with anxiety, anger, and shame. An interaction for satisfaction occurred such that students who were satisfied with their score had the same effect as objective score, but students who were unsatisfied with their score felt less hope, pride, relief, and more anger and shame. Implications for the control-value theory of emotions as well as for instructors are discussed.
  • Article
    Replication is necessary for theory building and knowledge accumulation in communication science. Thus, researchers might wonder about the frequency with which we are publishing replication attempts. We analyzed a representative sample of quantitative communication research published in central and regional journals between 2007 and 2016. Approximately one in every seven published reports was framed as a replication attempt, the majority of which were conceptual replications, and they were more likely to be found in central journals. Studies more frequently claimed in the Discussion (post hoc) that the results replicated an existing finding. The results suggest the field is pursuing replication more frequently than might be assumed; however, they are also consistent with a bias favouring originality and statistically significant results.