Working PaperPDF Available

How to become a Bayesian in eight easy steps: An annotated reading list

  • ETH Zurich,
Reference as: Etz, A., Gronau, Q. F., Dablander, F., Edelsbrunner, P. A., Baribault, B. (in press).
How to become a Bayesian in eight easy steps: An annotated reading list. Psychonomic Bulletin &
How to become a Bayesian in eight easy steps: An annotated
reading list
Alexander Etz
University of California, Irvine
Quentin F. Gronau
University of Amsterdam
Fabian Dablander
University of Tübingen
Peter A. Edelsbrunner
ETH Zürich
Beth Baribault
University of California, Irvine
In this guide, we present a reading list to serve as a concise introduction to
Bayesian data analysis. The introduction is geared toward reviewers, editors,
and interested researchers who are new to Bayesian statistics. We provide
commentary for eight recommended sources, which together cover the the-
oretical and practical cornerstones of Bayesian statistics in psychology and
related sciences. The resources are presented in an incremental order, start-
ing with theoretical foundations and moving on to applied issues. In addi-
tion, we outline an additional 32 articles and books that can be consulted to
gain background knowledge about various theoretical specifics and Bayesian
approaches to frequently used models. Our goal is to offer researchers a
starting point for understanding the core tenets of Bayesian analysis, while
requiring a low level of time commitment. After consulting our guide, the
reader should understand how and why Bayesian methods work, and feel
able to evaluate their use in the behavioral and social sciences.
In recent decades, significant advances in computational software and hardware have
allowed Bayesian statistics to rise to greater prominence in psychology (Van de Schoot,
Winter, Ryan, Zondervan-Zwijnenburg, & Depaoli,in press). In the past few years, this rise
has accelerated as a result of increasingly vocal criticism of p-values in particular (Nickerson,
2000;Wagenmakers,2007), and classical statistics in general (Trafimow & Marks,2015).
When a formerly scarcely used statistical method rapidly becomes more common, editors
Psychonomic Bulletin & Review 1/28
and peer reviewers are expected to master it readily, and to adequately evaluate and judge
manuscripts in which the method is applied. However, many researchers, reviewers, and
editors in psychology are still unfamiliar with Bayesian methods.
We believe that this is at least partly due to the perception that a high level of
difficulty is associated with proper use and interpretation of Bayesian statistics. Many
seminal texts in Bayesian statistics are dense, mathematically demanding, and assume
some background in mathematical statistics (e.g., Gelman et al.,2013). Even texts that
are geared toward psychologists (e.g., Lee & Wagenmakers,2014;Kruschke,2015), while
less mathematically difficult, require a radically different way of thinking than the classical
statistical methods most researchers are familiar with. Furthermore, transitioning to a
Bayesian framework requires a level of time commitment that is not feasible for many
researchers. More approachable sources that survey the core tenets and reasons for using
Bayesian methods exist, yet identifying these sources can prove difficult for researchers with
little or no previous exposure to Bayesian statistics.
In this guide, we provide a small number of primary sources that editors, reviewers,
and other interested researchers can study to gain a basic understanding of Bayesian statis-
tics. Each of these sources was selected for their balance of accessibility with coverage of
essential Bayesian topics. By focusing on interpretation, rather than implementation, the
guide is able to provide an introduction to core concepts, from Bayes’ theorem through to
Bayesian cognitive models, without getting mired in secondary details.
This guide is divided into two primary sections. The first, Theoretical sources, includes
commentaries on three articles and one book chapter that explain the core tenets of Bayesian
methods as well as their philosophical justification. The second, Applied sources, includes
commentaries on four articles that cover the most commonly used methods in Bayesian data
analysis at a primarily conceptual level. This section emphasizes issues of particular interest
to reviewers, such as basic standards for conducting and reporting Bayesian analyses.
We suggest that for each source, readers first review our commentary, then consult
the original source. The commentaries not only summarize the essential ideas discussed in
each source, but also give a sense of how those ideas fit into the bigger picture of Bayesian
statistics. This guide is part of a larger special issue in Psychonomic Bulletin & Review on
the topic of Bayesian inference that contains articles which elaborate on many of the same
points we discuss here, so we will periodically point to these as potential next steps for the
interested reader. For those who would like to delve further into the theory and practice of
Bayesian methods, the Appendix provides a number of supplemental sources that would be
of interest to researchers and reviewers. To facilitate readers’ selection of additional sources,
each source is briefly described and has been given a rating by the authors that reflects its
level of difficulty and general focus (i.e., theoretical versus applied; see Figure A1). It is
important to note that our reading list covers sources published up to the time of this
writing (August, 2016).
Overall, the guide is designed such that a researcher might be able to read all eight
of the highlighted articles1and some supplemental readings within a week. After readers
acquaint themselves with these sources, they should be well-equipped both to interpret
existing research and to evaluate new research that relies on Bayesian methods.
1Links to freely available versions of each article are provided in the References section.
Psychonomic Bulletin & Review 2/28
Theoretical sources
In this section, we discuss the primary ideas underlying Bayesian inference in in-
creasing levels of depth. Our first source introduces Bayes’ theorem and demonstrates how
Bayesian statistics are based on a different conceptualization of probability than classical,
or frequentist, statistics (Lindley,1993). These ideas are extended in our second source’s
discussion of Bayesian inference as a reallocation of credibility (Kruschke,2015) between
possible states of nature. The third source demonstrates how the concepts established in
the previous sources lead to many practical benefits for experimental psychology (Dienes,
2011). The section concludes with an in-depth review of Bayesian hypothesis testing using
Bayes factors with an emphasis on this technique’s theoretical benefits (Rouder, Speckman,
Sun, Morey, & Iverson,2009).
1. Conceptual introduction: What is Bayesian inference?
Source: Lindley (1993) — The analysis of experimental data: The appreciation of tea and
Lindley leads with a story in which renowned statistician Ronald A. Fisher is having
his colleague, Dr. Muriel Bristol, over for tea. When Fisher prepared the tea—as the story
goes—Dr. Bristol protested that Fisher had made the tea all wrong. She claims that tea
tastes better when milk is added first and infusion second,2rather than the other way
around; she furthermore professes her ability to tell the difference. Fisher subsequently
challenged Dr. Bristol to prove her ability to discern the two methods of preparation in a
perceptual discrimination study. In Lindley’s telling of the story, which takes some liberties
with the actual design of the experiment in order to emphasize a point, Dr. Bristol correctly
identified 5 out of 6 cups where the tea was added either first or second. This result left
Fisher faced with the question: Was his colleague merely guessing, or could she really tell
the difference? Fisher then proceeded to develop his now classic approach in a sequence of
steps, recognizing at various points that tests that seem intuitively appealing actually lead
to absurdities, until he arrived at a method that consists of calculating the total probability
of the observed result plus the probability of any more extreme results possible under the
null hypothesis (i.e., the probability that she would correctly identify 5 or 6 cups by sheer
guessing). This probability is the p-value. If it is less than .05, then Fisher would declare
the result significant and reject the null hypothesis of guessing.
Lindley’s paper essentially continues Fisher’s work, showing that Fisher’s classic pro-
cedure is inadequate and itself leads to absurdities because it hinges upon the nonexistent
ability to define what other unobserved results would count as “more extreme” than the
actual observations. That is, if Fisher had set out to serve Dr. Bristol 6 cups (and only
6 cups) and she is correct 5 times, then we get a p-value of .1, which is not statistically
significant. According to Fisher, in this case we should not reject the null hypothesis that
Dr. Bristol is guessing. But had he set out to keep giving her additional cups until she was
correct 5 times, which incidentally required 6 cups, we get a p-value of .03, which is sta-
tistically significant. According to Fisher, we should now reject the null hypothesis. Even
2As a historical note: Distinguishing milk-first from infusion-first tea preparation was not a particular
affectation of Dr. Bristol’s, but a cultural debate that has persisted for over three centuries (e.g.; Orwell,
Psychonomic Bulletin & Review 3/28
Figure 1 . A reproduction of Figure 2 from Lindley (1993). The left bar indicates the
probability that Dr. Bristol is guessing prior to the study (.8), if 5 right and 1 wrong are
observed (.59), and if 6 right and 0 wrong are observed (.23). The lines represents Lindley’s
corresponding beliefs about Dr. Bristol’s accuracy if she is not guessing.
though the data observed in both cases are exactly the same, we reach different conclusions
because our definition of “more extreme” results (that did not occur) changes depending
on which sampling plan we use. Absurdly, the p-value, and with it our conclusion about
Dr. Bristol’s ability, depends on how we think about results that might have occurred but
never actually did, and that in turn depends on how we planned the experiment (rather
than only on how it turned out).
Lindley’s Bayesian solution to this problem considers only the probability of observa-
tions actually obtained, avoiding the problem of defining more extreme, unobserved results.
The observations are used to assign a probability to each possible value of Dr. Bristol’s
success rate. Lindley’s Bayesian approach to evaluating Dr. Bristol’s ability to discrimi-
nate between the differently made teas starts by assigning a priori probabilities across the
range of values of her success rate. If it is reasonable to consider that Dr. Bristol is simply
guessing the outcome at random (i.e., her rate of success is .5), then one must assign an a
priori probability to this null hypothesis (see our Figure 1, and note the separate amount of
probability assigned to p=.5). The remaining probability is distributed among the range
of other plausible values of Dr. Bristol’s success rate (i.e., rates that do not assume that
Psychonomic Bulletin & Review 4/28
she is guessing at random)3. Then the observations are used to update these probabilities
using Bayes’ rule (this is derived in detail in Etz & Vandekerckhove,this issue). If the ob-
servations better fit with the null hypothesis (pure guessing), then the probability assigned
to the null hypothesis will increase; if the data better fit the alternative hypothesis, then
the probability assigned to the alternative hypothesis will increase, and subsequently the
probability attached to the null hypothesis will decrease (note the decreasing probability
of the null hypothesis on the left axis of Figure 2). The factor by which the data shift the
balance of the hypotheses’ probabilities is the Bayes factor (Kass & Raftery,1995; see also
Rouder et al.,2009, and Dienes,2011, below).
A key takeaway from this paper is that Lindley’s Bayesian approach depends only on
the observed data, so the results are interpretable regardless of whether the sampling plan
was rigid or flexible or even known at all. Another key point is that the Bayesian approach
is inherently comparative: Hypotheses are tested against one another and never in isolation.
Lindley further concludes that, since the posterior probability that the null is true will often
be higher than the p-value, the latter metric will discount null hypotheses more easily in
2. Bayesian credibility assessments
Source: Kruschke (2015, Chapter 2) — Introduction: Credibility, models, and parameters
“How often have I said to you that when all other θyield P(x|θ)of 0, whatever
remains, however low its P(θ), must have P(θ|x)=1?”
– Sherlock Holmes, paraphrased
In this book chapter, Kruschke explains the fundamental Bayesian principle of reallocation
of probability, or “credibility,” across possible states of nature. Kruschke uses an exam-
ple featuring Sherlock Holmes to demonstrate that the famous detective essentially used
Bayesian reasoning to solve his cases. Suppose that Holmes has determined that there ex-
ist only four different possible causes (A, B, C, and D) of a committed crime which, for
simplicity in the example, he holds to be equally credible at the outset. This translates to
equal prior probabilities for each of the four possible causes (i.e., a prior probability of 1/4
for each). Now suppose that Holmes gathers evidence that allows him to rule out cause
A with certainty. This development causes the probability assigned to A to drop to zero,
and the probability that used to be assigned to cause A to be then redistributed across the
other possible causes. Since the probabilities for the four alternatives need to sum to one,
the probability for each of the other causes is now equal to 1/3(Figure 2.1, p. 17). What
Holmes has done is reallocate credibility across the different possible causes based on the
evidence he has gathered. His new state of knowledge is that only one of the three remaining
alternatives can be the cause of the crime and that they are all equally plausible. Holmes,
3If the null hypothesis is not initially considered tenable, then we can proceed without assigning separate
probability to it and instead focus on estimating the parameters of interest (e.g., the taster’s accuracy in
distinguishing wines, as in Lindley’s second example; see Lindley’s Figure 1, and notice that the amount
of probability assigned to p=.5is gone). Additionally, if a range of values of the parameter is considered
impossible—such as rates that are below chance—then this range may be given zero prior probability.
Psychonomic Bulletin & Review 5/28
being a man of great intellect, is eventually able to completely rule out two of the remaining
three causes, leaving him with only one possible explanation—which has to be the cause of
the crime (as it now must have probability equal to 1), no matter how improbable it might
have seemed at the beginning of his investigation.
The reader might object that it is rather unrealistic to assume that data can be
gathered that allow a researcher to completely rule out contending hypotheses. In real
applications, psychological data are noisy, and outcomes are only probabilistically linked
to the underlying causes. In terms of reallocation of credibility, this means that possible
hypotheses can rarely be ruled out completely (i.e., reduced to zero probability), however,
their credibility can be greatly diminished, leading to a substantial increase in the credibility
of other possible hypotheses. Although a hypothesis has not been eliminated, something
has been learned: Namely, that one or more of the candidate hypotheses has had their
probabilities reduced and are now less likely than the others.
In a statistical context, the possible hypotheses are parameter values in mathematical
models that serve to describe the observed data in a useful way. For example, a scientist
could assume that their observations are normally distributed and be interested in which
range of values for the mean is most credible. Sherlock Holmes only considered a set of
discrete possibilities, but in many cases it would be very restrictive to only allow a few alter-
natives (e.g., when estimating the mean of a normal distribution). In the Bayesian frame-
work one can easily consider an infinite continuum of possibilities, across which credibility
may still be reallocated. It is easy to extend this framework of reallocation of credibility to
hypothesis testing situations where one parameter value is seen as “special” and receives a
high amount of prior probability compared to the alternatives (as in Lindley’s tea example
Kruschke (2015) serves as a good first introduction to Bayesian thinking, as it requires
only basic statistical knowledge (a natural follow-up is Kruschke & Liddell,this issue). In
this chapter, Kruschke also provides a concise introduction to mathematical models and
parameters, two core concepts which our other sources will build on. One final key takeaway
from this chapter is the idea of sequential updating from prior to posterior (Figure 2.1, p. 17)
as data are collected. As Dennis Lindley famously said: “Today’s posterior is tomorrow’s
prior” (Lindley,1972, p. 2).
3. Implications of Bayesian statistics for experimental psychology
Source: Dienes (2011) — Bayesian versus orthodox statistics: Which side are you on?
Dienes explains several differences between the frequentist (which Dienes calls or-
thodox and we have called classical; we use these terms interchangeably) and Bayesian
paradigm which have practical implications for how experimental psychologists conduct ex-
periments, analyze data, and interpret results (a natural follow-up to the discussion in this
section is available in Dienes & McLatchie,this issue). Throughout the paper, Dienes also
discusses subjective (or context-dependent) Bayesian methods which allow for inclusion of
relevant problem-specific knowledge in to the formation of one’s statistical model.
The probabilities of data given theory and of theory given data. When
testing a theory, both the frequentist and Bayesian approaches use probability theory as
the basis for inference, yet in each framework, the interpretation of probability is different.
Psychonomic Bulletin & Review 6/28
It is important to be aware of the implications of this difference in order to correctly in-
terpret frequentist and Bayesian analyses. One major contrast is a result of the fact that
frequentist statistics only allow for statements to be made about P(data |theory)4: Assum-
ing the theory is correct, the probability of observing the obtained (or more extreme) data is
evaluated. Dienes argues that often the probability of the data assuming a theory is correct
is not the probability the researcher is interested in. What researchers typically want to
know is P(theory |data): Given that the data were those obtained, what is the probability
that the theory is correct? At first glance, these two probabilities might appear similar, but
Dienes illustrates their fundamental difference with the following example: The probability
that a person is dead (i.e., data) given that a shark has bitten the person’s head off (i.e.,
theory) is 1. However, given that a person is dead, the probability that a shark has bitten
this person’s head off is very close to zero (see Senn,2013, for an intuitive explanation of
this distinction). It is important to keep in mind that a p-value does not correspond to
P(theory |data); in fact, statements about this probability are only possible if one is willing
to attach prior probabilities (degrees of plausibility or credibility) to theories—which can
only be done in the Bayesian paradigm.
In the following sections, Dienes explains how the Bayesian approach is more liber-
ating than the frequentist approach with regard to the following concepts: stopping rules,
planned versus post hoc comparisons, and multiple testing. For those new to the Bayesian
paradigm, these proposals may seem counterintuitive at first, but Dienes provides clear and
accessible explanations for each.
Stopping rules. In the classical statistical paradigm, it is necessary to specify
in advance how the data will be collected. In practice, one usually has to specify how
many participants will be collected; stopping data collection early or continuing after the
pre-specified number of participants has been reached is not permitted. One reason why
collecting additional participants is not permitted in the typical frequentist paradigm is
that, given the null hypothesis is true, the p-value is not driven in a particular direction
as more observations are gathered. In fact, in many cases the distribution of the p-value
is uniform when the null hypothesis is true, meaning that every p-value is equally likely
under the null. This implies that even if there is no effect, a researcher is guaranteed to
obtain a statistically significant result if they simply continue to collect participants and
stop when the p-value is sufficiently low. In contrast, the Bayes factor, the most common
Bayesian method of hypothesis testing, will approach infinite support in favor of the null
hypothesis as more observations are collected if the null hypothesis is true. Furthermore,
since Bayesian inference obeys the likelihood principle, one is allowed to continue or stop
collecting participants at any time while maintaining the validity of one’s results (p. 276;
see also Cornfield,1966,Rouder,2014, and Royall,2004 in the appended Further Reading
Planned versus post hoc comparisons. In the classical hypothesis-testing ap-
proach, a distinction is made between planned and post hoc comparisons: It matters whether
the hypothesis was formulated before or after data collection. In contrast, Dienes argues
that adherence to the likelihood principle entails that a theory does not necessarily need
to precede the data when a Bayesian approach is adopted; since this temporal information
does not enter into the likelihood function for the data, the evidence for or against the
4The conditional probability (P) of data given (|) theory.
Psychonomic Bulletin & Review 7/28
theory will be the same no matter its temporal relation to the data.
Multiple testing. When conducting multiple tests in the classical approach, it is
important to correct for the number of tests performed (see Gelman & Loken,2014). Dienes
points out that within the Bayesian approach, the number of hypotheses tested does not
matter—it is not the number of tests that is important, but the evaluation of how accurately
each hypothesis predicts the observed data. Nevertheless, it is crucial to consider all relevant
evidence, including so-called “outliers,” because “cherry picking is wrong on all statistical
approaches” (Dienes,2011, p. 280).
Context-dependent Bayes factors. The last part of the article addresses how
problem-specific knowledge may be incorporated in the calculation of the Bayes factor. As
is also explained in our next highlighted source (Rouder et al.,2009), there are two main
schools of Bayesian thought: default (or objective) Bayes and context-dependent (or sub-
jective) Bayes. In contrast to the default Bayes factors for general application that are
designed to have certain desirable mathematical properties (e.g., Jeffreys,1961;Rouder
et al.,2009;Rouder & Morey,2012;Rouder, Morey, Speckman, & Province,2012;Ly,
Verhagen, & Wagenmakers,2016), Dienes provides an online calculator5that enables one
to obtain context-dependent Bayes factors that incorporate domain knowledge for several
commonly used statistical tests. In contrast to the default Bayes factors, which are typi-
cally designed to use standardized effect sizes, the context-dependent Bayes factors specify
prior distributions in terms of the raw effect size. Readers who are especially interested
in prior elicitation should see the appendix of Dienes’ article for a short review of how to
appropriately specify prior distributions that incorporate relevant theoretical information
(and Dienes,2014, for more details and worked examples).
4. Structure and motivation of Bayes factors
Source: Rouder et al. (2009) — Bayesian t-tests for accepting and rejecting the null hy-
In many cases, a scientist’s primary interest is in showing evidence for an invariance,
rather than a difference. For example, researchers may want to conclude that experimental
and control groups do not differ in performance on a task (e.g., van Ravenzwaaij, Boekel,
Forstmann, Ratcliff, & Wagenmakers,2014), that participants were performing at chance
(Dienes & Overgaard,2015), or that two variables are unrelated (Rouder & Morey,2012).
In classical statistics this is generally not possible as significance tests are asymmetric;
they can only serve to reject the null hypothesis and never to affirm it. One benefit of
Bayesian analysis is that inference is perfectly symmetric, meaning evidence can be obtained
that favors the null hypothesis as well as the alternative hypothesis (see Gallistel,2009,
as listed in our Further Reading appendix). This is made possible by the use of Bayes
factors.6The section covering the shortcomings of classical statistics (“Critiques of Inference
by Significance Tests”) can safely be skipped, but readers particularly interested in the
motivation of Bayesian inference are advised to read it.
6Readers for whom Rouder and colleagues’ (2009) treatment is too technical could focus on Dienes’
conceptual ideas and motivations underlying the Bayes factor.
Psychonomic Bulletin & Review 8/28
What is a Bayes factor?. The Bayes factor is a representation of the relative pre-
dictive success of two or more models, and it is a fundamental measure of relative evidence.
The way Bayesians quantify predictive success of a model is to calculate the probability of
the data given that model—also called the marginal likelihood or sometimes simply the evi-
dence. The ratio of two such probabilities is the Bayes factor. Rouder and colleagues (2009)
denote the probability of the data given some model, represented by Hi, as f(data |Hi).7
The Bayes factor for H0versus H1is simply the ratio of f(data |H0)and f(data |H1)
written B01 (or BF01 ), where the B(or B F ) indicates a Bayes factor, and the subscript
indicates which two models are being compared (see p. 228). If the result of a study is
B01 = 10 then the data are ten times more probable under H0than under H1. Researchers
should report the exact value of the Bayes factor since it is a continuous measure of ev-
idence, but various benchmarks have been suggested to help researchers interpret Bayes
factors, with values between 1 and 3, between 3 and 10, and greater than 10 generally
taken to indicate inconclusive, weak, and strong evidence, respectively (see Jeffreys,1961;
Wagenmakers,2007;Etz & Vandekerckhove,2016), although different researchers may set
different benchmarks. Care is need when interpreting Bayes factors against these bench-
marks, as they are not meant to be bright lines against which we judge a study’s success
(as opposed to how a statistical significance criterion is sometimes treated); the difference
between a Bayes factor of, say, 8 and 12 is more a difference of degree than of category.
Furthermore, Bayes factors near 1 indicate the data are uninformative, and should not be
interpreted as even mild evidence for either of the hypotheses under consideration.
Readers who are less comfortable with reading mathematical notation may skip over
most of the equations without too much loss of clarity. The takeaway is that to evaluate
which model is better supported by the data, we need to find out which model has done
the best job predicting the data we observe. To a Bayesian, the probability a model assigns
to the observed data constitutes its predictive success (see Morey, Romeijn, & Rouder,
2016); a model that assigns a high probability to the data relative to another model is
best supported by the data. The goal is then to find the probability a given model assigns
the data, f(data |Hi). Usually the null hypothesis specifies that the true parameter is a
particular value of interest (e.g., zero), so we can easily find f(data |H0). However, we
generally do not know the value of the parameter if the null model is false, so we do not
know what probability it assigns the data. To represent our uncertainty with regard to
the true value of the parameter if the null hypothesis is false, Bayesians specify a range
of plausible values that the parameter might take under the alternative hypothesis. All
of these parameter values are subsequently used in computing an average probability of
the data given the alternative hypothesis, f(data |H1)(for an intuitive illustration, see
Gallistel,2009 as listed in our Further Reading appendix). If the prior distribution gives
substantial weight to parameter values that assign high probability to the data, then the
average probability the alternative hypothesis assigns to the data will be relatively high—the
model is effectively rewarded for its accurate predictions with a high value for f(data |H1).
The role of priors. The form of the prior can have important consequences on
the resulting Bayes factor. As discussed in our third source (Dienes,2011), there are two
7The probability (f) of the observed data given (|) hypothesis i(Hi), where iindicates one of the
candidate hypotheses (e.g., 0, 1, A, etc.). The null hypothesis is usually denoted H0and the alternative
hypothesis is usually denoted either H1or HA.
Psychonomic Bulletin & Review 9/28
primary schools of Bayesian thought: default (objective) Bayes (Berger,2006) and context-
dependent (subjective) Bayes (Goldstein et al.,2006;Rouder, Morey, & Wagenmakers,
2016). The default Bayesian tries to specify prior distributions that convey little information
while maintaining certain desirable properties. For example, one desirable property is that
changing the scale of measurement should not change the way the information is represented
in the prior, which is accomplished by using standardized effect sizes. Context-dependent
prior distributions are often used because they more accurately encode our prior information
about the effects under study, and can be represented with raw or standardized effect sizes,
but they do not necessarily have the same desirable mathematical properties (although
sometimes they can).
Choosing a prior distribution for the standardized effect size is relatively straightfor-
ward for the default Bayesian. One possibility is to use a normal distribution centered at 0
and with some standard deviation (i.e., spread) σ. If σis too large, the Bayes factor will
always favor the null model, so such a choice would be unwise (see also DeGroot,1982;
Robert,2014). This happens because such a prior distribution assigns weight to very ex-
treme values of the effect size, when in reality, the effect is most often reasonably small (e.g.,
almost all psychological effects are smaller than Cohen’s d= 2). The model is penalized
for low predictive success. Setting σto 1is reasonable and common—this is called the
unit information prior. However, using a Cauchy distribution (which resembles a normal
distribution but with less central mass and fatter tails) has some better properties than the
unit information prior, and is now a common default prior on the alternative hypothesis,
giving rise to what is now called the default Bayes factor (see Rouder & Morey,2012 for
more details; see also Wagenmakers, Love, et al.,this issue and Wagenmakers, Marsman,
et al.,this issue). To use the Cauchy distribution, like the normal distribution, again one
must specify a scaling factor. If it is too large, the same problem as before occurs where
the null model will always be favored. Rouder and colleagues suggest a scale of 1, which
implies that the effect size has a prior probability of 50% to be between d=1and d= 1.
For some areas, such as social psychology, this is not reasonable, and the scale should be
reduced. However, slight changes to the scale often do not make much difference in the
qualitative conclusions one draws.
Readers are advised to pay close attention to the sections “Subjectivity in priors”
and “Bayes factors with small effects. The former explains how one can tune the scale
of the default prior distribution to reflect more contextually relevant information while
maintaining the desirable properties attached to prior distributions of this form, a practice
that is a reasonable compromise between the default and context-dependent schools. The
latter shows why the Bayes factor will often show evidence in favor of the null hypothesis
if the observed effect is small and the prior distribution is relatively diffuse.
Applied sources
At this point, the essential concepts of Bayesian probability, Bayes’ theorem, and the
Bayes factor have been discussed in depth. In the following four sources, these concepts
are applied to real data analysis situations. Our first source provides a broad overview of
the most common methods of model comparison, including the Bayes factor, with a heavy
emphasis on its proper interpretation (Vandekerckhove, Matzke, & Wagenmakers,2015).
The next source begins by demonstrating Bayesian estimation techniques in the context
Psychonomic Bulletin & Review 10/28
of developmental research, then provides some guidelines for reporting Bayesian analyses
(van de Schoot et al.,2014). Our final two sources discuss issues in Bayesian cognitive
modeling, such as the selection of appropriate priors (Lee & Vanpaemel,this issue), and
the use of cognitive models for theory testing (Lee,2008).
Before moving on to our final four highlighted sources, it will be useful if readers
consider some differences in perspective among practitioners of Bayesian statistics. The
application of Bayesian methods is very much an active field of study, and as such, the
literature contains a multitude of deep, important, and diverse viewpoints on how data
analysis should be done, similar to the philosophical divides between Neyman–Pearson and
Fisher concerning proper application of classical statistics (see Lehmann,1993). The divide
between subjective Bayesians, who elect to use priors informed by theory, and objective
Bayesians, who instead prefer “uninformative” or default priors, has already been mentioned
throughout the Theoretical sources section above.
A second division of note exists between Bayesians who see a place for hypothesis
testing in science, and those who see statistical inference primarily as a problem of estima-
tion. The former believe statistical models can stand as useful surrogates for theoretical
positions, whose relative merits are subsequently compared using Bayes factors and other
such “scoring” metrics (as reviewed in Vandekerckhove et al.,2015, discussed below; for
additional examples, see Jeffreys,1961 and Rouder, Morey, Verhagen, Province, & Wagen-
makers,2016). The latter would rather delve deeply into a single model or analysis and
use point estimates and credible intervals of parameters as the basis for their theoretical
conclusions (as demonstrated in Lee,2008, discussed below; for additional examples, see
Gelman & Shalizi,2013 and McElreath,2016).8
Novice Bayesians may feel surprised that such wide divisions exist, as statistics (of
any persuasion) is often thought of as a set of prescriptive, immutable procedures that can
be only right or wrong. We contend that debates such as these should be expected due
to the wide variety of research questions—and diversity of contexts—to which Bayesian
methods are applied. As such, we believe that the existence of these divisions speaks to the
intellectual vibrancy of the field and its practitioners. We point out these differences here
so that readers might use this context to guide their continued reading.
5. Bayesian model comparison methods
Source: Vandekerckhove et al. (2015) — Model comparison and the principle of parsimony
John von Neumann famously said: “With four parameters I can fit an elephant,
and with five I can make him wiggle his trunk” (as quoted in Mayer, Khairy, & Howard,
2010, p. 698), pointing to the natural tension between model parsimony and goodness
of fit. The tension occurs because it is always possible to decrease the amount of error
between a model’s predictions and the observed data by simply adding more parameters
to the model. In the extreme case, any data set of Nobservations can be reproduced
8This divide in Bayesian statistics may be seen as a parallel to the recent discussions about use of classical
statistics in psychology (e.g., Cumming,2014), where a greater push has been made to adopt an estimation
approach over null hypothesis significance testing (NHST). Discussions on the merits of hypothesis testing
have been running through all of statistics for over a century, with no end in sight.
Psychonomic Bulletin & Review 11/28
perfectly by a model with Nparameters. Such practices, however, termed overfitting,
result in poor generalization and greatly reduce the accuracy of out-of-sample predictions.
Vandekerckhove and colleagues (2015) take this issue as a starting point to discuss various
criteria for model selection. How do we select a model that both fits the data well and
generalizes adequately to new data?
Putting the problem in perspective, the authors discuss research on recognition mem-
ory that relies on multinomial processing trees, which are simple, but powerful, cognitive
models. Comparing these different models using only the likelihood term is ill-advised,
because the model with the highest number of parameters will—all other things being
equal—yield the best fit. As a first step to addressing this problem, Vandekerckhove et al.
(2015) discuss the popular Akaike information criterion (AIC) and Bayesian information
criterion (BIC).
Though derived from different philosophies (for an overview, see Aho, Derryberry, &
Peterson,2014), both AIC and BIC try to solve the trade-off between goodness-of-fit and
parsimony by combining the likelihood with a penalty for model complexity. However, this
penalty is solely a function of the number of parameters and thus neglects the functional
form of the model, which can be informative in its own right. As an example, the authors
mention Fechner’s law and Steven’s law. The former is described by a simple logarithmic
function, which can only ever fit negatively accelerated data. Steven’s law, however, is
described by an exponential function, which can account for both positively and negatively
accelerated data. Additionally, both models feature just a single parameter, nullifying the
benefit of the complexity penalty in each of the two aforementioned information criteria.
The Bayes factor yields a way out. It extends the simple likelihood ratio test by
integrating the likelihood with respect to the prior distribution, thus taking the predictive
success of the prior distribution into account (see also Gallistel,2009, in the Further Reading
appendix). Essentially, the Bayes factor is a likelihood ratio test averaged over all possible
parameter values for the model, using the prior distributions as weights: It is the natural
extension of the likelihood ratio test to a Bayesian framework. The net effect of this is to
penalize complex models. While a complex model can predict a wider range of possible
data points than a simple model can, each individual data point is less likely to be observed
under the complex model. This is reflected in the prior distribution being more spread
out in the complex model. By weighting the likelihood by the corresponding tiny prior
probabilities, the Bayes factor in favor of the complex model decreases. In this way, the
Bayes factor instantiates an automatic Ockham’s Razor (see also Myung & Pitt,1997, in
the appended Further Reading section).
However, the Bayes factor can be difficult to compute because it often involves inte-
gration over very many dimensions at once. Vandekerckhove and colleagues (2015) advocate
two methods to ease the computational burden: importance sampling and the Savage-Dickey
density ratio (see also Wagenmakers, Lodewyckx, Kuriyal, & Grasman,2010 in our in our
Further reading appendix); additional common computational methods include the Laplace
approximation (Kass & Raftery,1995), bridge sampling (Meng & Wong,1996;Gronau et
al.,2017), and the encompassing prior approach (Hoijtink, Klugkist, & Boelen,2008). They
also provide code to estimate parameters in multinomial processing tree models and to com-
pute the Bayes factor to select among them. Overall, the chapter provides a good overview
of different methods used to tackle the tension between goodness-of-fit and parsimony in
Psychonomic Bulletin & Review 12/28
a Bayesian framework. While it is more technical then the sources reviewed above, this
article can greatly influence how one thinks about models and methods for selecting among
6. Bayesian estimation
Source: van de Schoot et al. (2014) — A gentle introduction to Bayesian analysis: Appli-
cations to developmental research
This source approaches practical issues related to parameter estimation in the context
of developmental research. This setting offers a good basis for discussing the choice of priors
and how those choices influence the posterior estimates for parameters of interest. This is a
topic that matters to reviewers and editors alike: How does the choice of prior distributions
for focal parameters influence the statistical results and theoretical conclusions that are
obtained? The article discusses this issue on a basic and illustrative level.
At this point we feel it is important to note that the difference between hypothesis
testing and estimation in the Bayesian framework is much greater than it is in the frequentist
framework. In the frequentist framework there is often a one-to-one relationship between the
null hypothesis falling outside the sample estimate’s 95% confidence interval and rejection
of the null hypothesis with a significance test (e.g., when doing a t-test). This is not so
in the Bayesian framework; one cannot test a null hypothesis by simply checking if the
null value is inside or outside a credible interval. A detailed explanation of the reason for
this deserves more space than we can afford to give it here, but in short: When testing
hypotheses in the Bayesian framework one should calculate a model comparison metric.
See Rouder and Vandekerckhove (this issue) for an intuitive introduction to (and synthesis
of) the distinction between Bayesian estimation and testing.
Van de Schoot and colleagues (2014) begin by reviewing the main differences between
frequentist and Bayesian approaches. Most of this part can be skipped by readers who are
comfortable with basic terminology at that point. The only newly introduced term is Markov
chain Monte Carlo (MCMC) methods, which refers to the practice of drawing samples from
the posterior distribution instead of deriving the distribution analytically (which may not
be feasible for many models; see also van Ravenzwaaij, Cassey, & Brown,this issue and
Matzke, Boehm, & Vandekerckhove,this issue). After explaining this alternative approach
(p. 848), Bayesian estimation of focal parameters and the specification of prior distributions
is discussed with the aid of two case examples.
The first example concerns estimation of an ordinary mean value and the variance
of reading scores and serves to illustrate how different sources of information can be used
to inform the specification of prior distributions. The authors discuss how expert domain
knowledge (e.g., reading scores usually fall within a certain range), statistical considera-
tions (reading scores are normally distributed), and evidence from previous studies (results
obtained from samples from similar populations) may be jointly used to define adequate
priors for the mean and variance model parameters. The authors perform a prior sensitivity
analysis to show how using priors based on different considerations influence the obtained
results. Thus, the authors examine and discuss how the posterior distributions of the mean
and variance parameters are dependent on the prior distributions used.
The second example focuses on a data set from research on the longitudinal reciprocal
Psychonomic Bulletin & Review 13/28
associations between personality and relationships. The authors summarize a series of
previous studies and discuss how results from these studies may or may not inform prior
specifications for the latest obtained data set. Ultimately, strong theoretical considerations
are needed to decide whether data sets that were gathered using slightly different age groups
can be used to inform inferences about one another.
The authors fit a model with data across two time points and use it to discuss how
convergence of the MCMC estimator can be supported and checked. They then evaluate
overall model fit via a posterior predictive check. In this type of model check, data simulated
from the specified model are compared to the observed data. If the model is making
appropriate predictions, the simulated data and the observed data should appear similar.
The article concludes with a brief outline of guidelines for reporting Bayesian analyses and
results in a manuscript. Here, the authors emphasize the importance of the specification
of prior distributions and of convergence checks (if MCMC sampling is used) and briefly
outline how both might be reported. Finally, the authors discuss the use of default priors
and various options for conducting Bayesian analyses with common software packages (such
as Mplus and WinBUGS).
The examples in the article illustrate different considerations that should be taken into
account for choosing prior specifications, the consequences they can have on the obtained
results, and how to check whether and how the choice of priors influenced the resulting
7. Prior elicitation
Source: Lee and Vanpaemel (this issue) — Determining priors for cognitive models
Statistics does not operate in a vacuum, and often prior knowledge is available that
can inform one’s inferences. In contrast to classical statistics, Bayesian statistics allows one
to formalize and use this prior knowledge for analysis. The paper by Lee and Vanpaemel
(this issue) fills an important gap in the literature: What possibilities are there to formalize
and uncover prior knowledge?
The authors start by noting a fundamental point: Cognitive modeling is an extension
of general purpose statistical modeling (e.g., linear regression). Cognitive models are de-
signed to instantiate theory, and thus may need to use richer information and assumptions
than general purpose models (see also Franke,2016). A consequence of this is that the prior
distribution, just like the likelihood, should be seen as an integral part of the model. As
Jaynes (2003) put it: “If one fails to specify the prior information, a problem of inference
is just as ill-posed as if one had failed to specify the data” (p. 373).
What information can we use to specify a prior distribution? Because the parameters
in such a cognitive model usually have a direct psychological interpretation, theory may be
used to constrain parameter values. For example, a parameter interpreted as a probability
of correctly recalling a word must be between 0 and 1. To make this point clear, the
authors discuss three cognitive models and show how the parameters instantiate relevant
information about psychological processes. Lee and Vanpaemel also discuss cases in which
all of the theoretical content is carried by the prior, while the likelihood does not make any
strong assumptions. They also discuss the principle of transformation invariance, that is,
prior distributions for parameters should be invariant to the scale they are measured on
Psychonomic Bulletin & Review 14/28
(e.g., measuring reaction time using seconds versus milliseconds).
Lee and Vanpaemel also discuss specific methods of prior specification. These include
the maximum entropy principle, the prior predictive distribution, and hierarchical modeling.
The prior predictive distribution is the model-implied distribution of the data, weighted with
respect to the prior. Recently, iterated learning methods have been employed to uncover
an implicit prior held by a group of participants. These methods can also be used to elicit
information that is subsequently formalized as a prior distribution. (For a more in-depth
discussion of hierarchical cognitive modeling, see Lee,2008, discussed below.)
In sum, the paper gives an excellent overview of why and how one can specify prior
distributions for cognitive models. Importantly, priors allow us to integrate domain-specific
knowledge, and thus build stronger theories (Platt,1964;Vanpaemel,2010). For more
information on specifying prior distributions for data-analytic statistical models rather than
cognitive models see Rouder, Morey, Verhagen, Swagman, and Wagenmakers (in press) and
Rouder, Engelhardt, McCabe, and Morey (2016).
8. Bayesian cognitive modeling
Source: Lee (2008) — Three case studies in the Bayesian analysis of cognitive models
Our final source (Lee,2008) further discusses cognitive modeling, a more tailored
approach within Bayesian methods. Often in psychology, a researcher will not only expect
to observe a particular effect, but will also propose a verbal theory of the cognitive process
underlying the expected effect. Cognitive models are used to formalize and test such verbal
theories in a precise, quantitative way. For instance, in a cognitive model, psychological
constructs, such as attention and bias, are expressed as model parameters. The proposed
psychological process is expressed as dependencies among parameters and observed data
(the “structure” of the model).
In peer-reviewed work, Bayesian cognitive models are often presented in visual form
as a graphical model. Model parameters are designated by nodes, where the shape, shad-
ing, and style of border of each node reflect various parameter characteristics. Dependencies
among parameters are depicted as arrows connecting the nodes. Lee gives an exceptionally
clear and concise description of how to read graphical models in his discussion of multidi-
mensional scaling (Lee,2008, p. 2).
After a model is constructed, the observed data are used to update the priors and
generate a set of posterior distributions. Because cognitive models are typically complex,
posterior distributions are almost always obtained through sampling methods (i.e., MCMC;
see van Ravenzwaaij et al.,this issue), rather than through direct, often intractable, analytic
Lee demonstrates the construction and use of cognitive models through three case
studies. Specifically, he shows how three popular process models may be implemented in
a Bayesian framework. In each case, he begins by explaining the theoretical basis of each
model, then demonstrates how the verbal theory may be translated into a full set of prior
distributions and likelihoods. Finally, Lee discusses how results from each model may be
interpreted and used for inference.
Each case example showcases a unique advantage of implementing cognitive models
in a Bayesian framework (see also Bartlema, Voorspoels, Rutten, Tuerlinckx, & Vanpaemel,
Psychonomic Bulletin & Review 15/28
this issue). For example, in his discussion of signal detection theory, Lee highlights how
Bayesian methods are able to account for individual differences easily (see also Rouder &
Lu,2005, in the Further reading appendix). Throughout, Lee emphasizes that Bayesian
cognitive models are useful because they allow the researcher to reach new theoretical
conclusions that would be difficult to obtain with non-Bayesian methods. Overall, this
source not only provides an approachable introduction to Bayesian cognitive models, but
also provides an excellent example of good reporting practices for research that employs
Bayesian cognitive models.
By focusing on interpretation, rather than implementation, we have sought to provide
a more accessible introduction to the core concepts and principles of Bayesian analysis than
may be found in introductions with a more applied focus. Ideally, readers who have read
through all eight of our highlighted sources, and perhaps some of the supplementary reading,
may now feel comfortable with the fundamental ideas in Bayesian data analysis, from basic
principles (Kruschke,2015;Lindley,1993) to prior distribution selection (Lee & Vanpaemel,
this issue), and with the interpretation of a variety of analyses, including Bayesian analogs
of classical statistical tests (e.g., t-tests; Rouder et al.,2009), estimation in a Bayesian
framework (van de Schoot et al.,2014), Bayes factors and other methods for hypothesis
testing (Dienes,2011;Vandekerckhove et al.,2015), and Bayesian cognitive models (Lee,
Reviewers and editors unfamiliar with Bayesian methods may initially feel hesitant to
evaluate empirical articles in which such methods are applied (Wagenmakers, Love, et al.,
this issue). Ideally, the present article should help ameliorate this apprehension by offering
an accessible introduction to Bayesian methods that is focused on interpretation rather than
application. Thus, we hope to help minimize the amount of reviewer reticence caused by
authors’ choice of statistical framework.
Our overview was not aimed at comparing the advantages and disadvantages of
Bayesian and classical methods. However, some conceptual conveniences and analytic
strategies that are only possible or valid in the Bayesian framework will have become ev-
ident. For example, Bayesian methods allow for the easy implementation of hierarchical
models for complex data structures (Lee,2008), they allow multiple comparisons and flexi-
ble sampling rules during data collection without correction of inferential statistics (Dienes,
2011; see also Schönbrodt, Wagenmakers, Zehetleitner, & Perugini,2015, as listed in our
Further reading appendix, and also Schönbrodt & Wagenmakers,this issue), and they allow
inferences that many researchers in psychology are interested in but are not able to answer
with classical statistics such as providing support for a null hypothesis (for a discussion, see
Wagenmakers,2007). Thus, the inclusion of more research that uses Bayesian methods in
the psychological literature should be to the benefit of the entire field (Etz & Vandekerck-
hove,2016). In this article, we have provided an overview of sources that should allow a
novice to understand how Bayesian statistics allows for these benefits, even without prior
knowledge of Bayesian methods.
Psychonomic Bulletin & Review 16/28
The authors would like to thank Jeff Rouder, E.-J. Wagenmakers, and Joachim Van-
dekerckhove for their helpful comments. AE and BB were supported by grant #1534472
from NSF’s Methods, Measurements, and Statistics panel. AE was further supported by the
National Science Foundation Graduate Research Fellowship Program (#DGE1321846).
Aho, K., Derryberry, D., & Peterson, T. (2014). Model selection for ecologists: the world-
views of aic and bic. Ecology,95 (3), 631–636. Retrieved from
aho2014 doi:
Bartlema, A., Voorspoels, W., Rutten, F., Tuerlinckx, F., & Vanpaemel, W. (this issue).
Sensitivity to the prototype in children with high-functioning autism spectrum dis-
order: An example of Bayesian cognitive psychometrics. Psychonomic Bulletin and
Berger, J. O. (2006). The case for objective Bayesian analysis. Bayesian analysis,1(3),
385–402. Retrieved from doi:
Berger, J. O., & Berry, D. A. (1988). Statistical analysis and the illusion of objectivity.
American Scientist,76 (2), 159–165. Retrieved from
Berger, J. O., & Delampady, M. (1987). Testing precise hypotheses. Statistical Science,
317–335. Retrieved from
Cornfield, J. (1966). Sequential trials, sequential analysis, and the likelihood principle. The
American Statistician,20 , 18–23. Retrieved from
Cumming, G. (2014). The new statistics why and how. Psychological Science,25 (1),
7–29. Retrieved from doi: 10.1177/
DeGroot, M. H. (1982). Lindley’s paradox: Comment. Journal of the American Statistical
Association, 336–339. Retrieved from
Dienes, Z. (2008). Understanding psychology as a science: An introduction to scientific and
statistical inference. Palgrave Macmillan.
Dienes, Z. (2011). Bayesian versus orthodox statistics: Which side are you on? Perspec-
tives on Psychological Science,6(3), 274–290. Retrieved from
Dienes, Z. (2014). Using Bayes to get the most out of non-significant results. Frontiers
in Psychology,5. Retrieved from
Dienes, Z., & McLatchie, N. (this issue). Four reasons to prefer Bayesian over orthodox
statistical analyses. Psychonomic Bulletin and Review.
Dienes, Z., & Overgaard, M. (2015). How Bayesian statistics are needed to determine
whether mental states are unconscious. Behavioural methods in consciousness re-
search, 199–220. Retrieved from
Psychonomic Bulletin & Review 17/28
Edwards, W., Lindman, H., & Savage, L. J. (1963). Bayesian statistical inference for
psychology research. Psychological Review,70 (3), 193–242. Retrieved from http://
Etz, A., & Vandekerckhove, J. (2016).
PLOS ONE,11 , e0149794. Retrieved from
.pone.0149794 doi: 10.1371/journal.pone.0149794
Etz, A., & Vandekerckhove, J. (this issue). Introduction to Bayesian inference for psychol-
ogy. Psychonomic Bulletin and Review.
Etz, A., & Wagenmakers, E.-J. (in press). J. B. S. Haldane’s contribution to the Bayes
factor hypothesis test. Statistical Science.
Franke, M. (2016). Task types, link functions & probabilistic modeling in experimental
pragmatics. In F. Salfner & U. Sauerland (Eds.), Preproceedings of ‘trends in experi-
mental pragmatics’ (pp. 56–63).
Gallistel, C. (2009). The importance of proving the null. Psychological review,116 (2), 439.
Retrieved from
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D., Vehtari, A., & Rubin, D. B. (2013).
Bayesian data analysis (Vol. 3). Chapman & Hall/CRC.
Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist,
102 (6), 460. Retrieved from
Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of Bayesian statistics.
British Journal of Mathematical and Statistical Psychology,66 (1), 8–38. Retrieved
from doi: 10.1111/j.2044-8317.2011.02037.x
Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics,33 (5), 587–
606. Retrieved from doi: 10.1016/j.socec
Goldstein, M., et al. (2006). Subjective Bayesian analysis: Principles and practice. Bayesian
Analysis,1(3), 403–420. Retrieved from
1340371036 doi: 10.1214/06-BA116
Gronau, Q. F., Sarafoglou, A., Matzke, D., Ly, A., Boehm, U., Marsman, M., . . . Stein-
groever, H. (2017). A tutorial on bridge sampling. arXiv preprint arXiv:1703.05984 .
Hoijtink, H., Klugkist, I., & Boelen, P. (2008). Bayesian evaluation of informative hypothe-
ses. Springer Science & Business Media.
Jaynes, E. T. (1986). Bayesian methods: General background. In J. H. Justice (Ed.),
Maximum entropy and bayesian methods in applied statistics (pp. 1–25). Cambridge
University Press. Retrieved from
Jaynes, E. T. (2003). Probability theory: The logic of science. Cambridge university press.
Jeffreys, H. (1936). Xxviii. on some criticisms of the theory of probability. The
London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science,
22 (146), 337–359. Retrieved from
.1080/14786443608561691 doi: 10.1080/14786443608561691
Jeffreys, H. (1961). Theory of probability (3rd ed.). Oxford, UK: Oxford University Press.
Kaplan, D., & Depaoli, S. (2012). Bayesian structural equation modeling. In R. Hoyle
(Ed.), Handbook of structural equation modeling (pp. 650–673). Guilford New York,
NY. Retrieved from
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical
Psychonomic Bulletin & Review 18/28
Association,90 , 773–795. Retrieved from
Kruschke, J. K. (2015). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan.
Academic Press. Retrieved from
Kruschke, J. K., & Liddell, T. (this issue). Bayesian data analysis for newcomers. Psycho-
nomic Bulletin and Review.
Lee, M. D. (2008). Three case studies in the Bayesian analysis of cognitive models. Psy-
chonomic Bulletin and Review,15 (1), 1–15. Retrieved from
Lee, M. D., & Vanpaemel, W. (this issue). Determining priors for cognitive models. Psy-
chonomic Bulletin & Review. Retrieved from
Lee, M. D., & Wagenmakers, E. J. (2014). Bayesian cognitive modeling: A practical course.
Cambridge University Press.
Lehmann, E. (1993). The fisher, neyman-pearson theories of testing hypotheses: One theory
or two? Journal of the American Statistical Association,88 (424), 1242–1249.
Lindley, D. V. (1972). Bayesian statistics, a review. Philadelphia (PA): SIAM.
Lindley, D. V. (1993). The analysis of experimental data: The appreciation of tea and wine.
Teaching Statistics,15 (1), 22–25. Retrieved from
doi: 10.1111/j.1467-9639.1993.tb00252.x
Lindley, D. V. (2000). The philosophy of statistics. The Statistician,49 (3), 293–337.
Retrieved from
Lindley, D. V. (2006). Understanding uncertainty. John Wiley & Sons.
Love, J., Selker, R., Marsman, M., Jamil, T., Dropmann, D., Verhagen, J., . . . Wagenmak-
ers, E.-J. (2015). JASP (version Computer Software.
Ly, A., Verhagen, A. J., & Wagenmakers, E.-J. (2016). Harold Jeffreys’s default Bayes factor
hypothesis tests: Explanation, extension, and application in psychology. Journal of
Mathematical Psychology,72 , 19–32. Retrieved from
Matzke, D., Boehm, U., & Vandekerckhove, J. (this issue). Bayesian inference for psychol-
ogy, Part III: Parameter estimation in nonstandard models. Psychonomic Bulletin
and Review.
Mayer, J., Khairy, K., & Howard, J. (2010). Drawing an elephant with four complex
parameters. American Journal of Physics,78 (6), 648–649. Retrieved from http://
McElreath, R. (2016). Statistical rethinking: A Bayesian course with examples in R and
Stan (Vol. 122). CRC Press.
Meng, X.-L., & Wong, W. H. (1996). Simulating ratios of normalizing constants via a
simple identity: a theoretical exploration. Statistica Sinica, 831–860.
Morey, R. D., Romeijn, J.-W., & Rouder, J. N. (2016). The philosophy of Bayes factors
and the quantification of statistical evidence. Journal of Mathematical Psychology.
Retrieved from
Myung, I. J., & Pitt, M. A. (1997). Applying Occam’s razor in modeling cognition: A
Bayesian approach. Psychonomic Bulletin & Review,4(1), 79–95. Retrieved from doi: 10.3758/BF03210778
Nickerson, R. S. (2000). Null hypothesis significance testing: a review of an old and
Psychonomic Bulletin & Review 19/28
continuing controversy. Psychological methods,5(2), 241. Retrieved from http:// doi: 10.1037//1082-989X.S.2.241
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science.
Science,349 (6251), aac4716. doi: 10.1126/science.aac4716
Orwell, G. (1946). A nice cup of tea. Evening Standard, January.
Platt, J. R. (1964). Strong inference. Science,146 (3642), 347–353.
Robert, C. P. (2014). On the Jeffreys-Lindley paradox. Philosophy of Science,81 (2),
216–232. Retrieved from
Rouder, J. N. (2014). Optional stopping: No problem for Bayesians. Psychonomic Bulletin
& Review,21 (2), 301–308. Retrieved from doi:
Rouder, J. N., Engelhardt, C. R., McCabe, S., & Morey, R. D. (2016). Model comparison
in anova. Psychonomic Bulletin & Review,23 , 1779-1786.
Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an
application in the theory of signal detection. Psychonomic Bul letin & Review,12 (4),
573–604. Retrieved from
Rouder, J. N., & Morey, R. D. (2012). Default Bayes factors for model selection in regression.
Multivariate Behavioral Research,47 (6), 877–903. Retrieved from http://tinyurl
.com/rouder2012regression doi: 10.1080/00273171.2012.734737
Rouder, J. N., Morey, R. D., Speckman, P. L., & Province, J. M. (2012). Default Bayes
factors for ANOVA designs. Journal of Mathematical Psychology,56 (5), 356–374.
Retrieved from
Rouder, J. N., Morey, R. D., Verhagen, J., Province, J. M., & Wagenmakers, E.-J. (2016).
Is there a free lunch in inference? Topics in Cognitive Science,8, 520-547. Retrieved
Rouder, J. N., Morey, R. D., Verhagen, J., Swagman, A. R., & Wagenmakers, E.-J. (in
press). Bayesian analysis of factorial designs. Psychological Methods. Retrieved from
Rouder, J. N., Morey, R. D., & Wagenmakers, E.-J. (2016). The interplay between subjec-
tivity, statistical practice, and psychological science. Collabra,2(1). Retrieved from
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian
t-tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin and
Review,16 (2), 225–237. Retrieved from doi:
Rouder, J. N., & Vandekerckhove, J. (this issue). Bayesian inference for psychology, Part
IV: Parameter estimation and Bayes factors. Psychonomic Bulletin and Review.
Royall, R. (1997). Statistical evidence: A likelihood paradigm (Vol. 77). CRC press.
Royall, R. (2004). The likelihood paradigm for statistical inference. In M. L. Taper &
S. R. Lele (Eds.), The nature of scientific evidence: Statistical, philosophical and
empirical considerations (pp. 119–152). The University of Chicago Press. Retrieved
Schönbrodt, F. D., & Wagenmakers, E.-J. (this issue). Bayes factor design analysis: Plan-
ning for compelling evidence. Psychonomic Bulletin and Review.
Schönbrodt, F. D., Wagenmakers, E.-J., Zehetleitner, M., & Perugini, M. (2015). Sequential
Psychonomic Bulletin & Review 20/28
hypothesis testing with Bayes factors: Efficiently testing mean differences. Psychologi-
cal Methods. Retrieved from
_id=2604513 doi: 10.1037/met0000061
Senn, S. (2013). Invalid inversion. Significance,10 (2), 40–42. Retrieved from http://
Sorensen, T., Hohenstein, S., & Vasishth, S. (2016). Bayesian linear mixed models
using Stan: A tutorial for psychologists, linguists, and cognitive scientists. The
Quantitative Methods for Psychology(3). Retrieved from
RegularArticles/vol12-3/p175/p175.pdf doi: 10.20982/tqmp.12.3.p175
Stone, J. V. (2013). Bayes’ rule: A tutorial introduction to Bayesian analysis. Sebtel Press.
Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology,37 (1),
1-2. Retrieved from
van Ravenzwaaij, D., Cassey, P., & Brown, S. (this issue). A simple introduction to Markov
chain Monte-Carlo sampling. Psychonomic Bul letin and Review.
Vandekerckhove, J., Matzke, D., & Wagenmakers, E.-J. (2015). Model comparison and the
principle of parsimony. In J. Busemeyer, J. Townsend, Z. J. Wang, & A. Eidels (Eds.),
Oxford Handbook of Computational and Mathematical Psychology (pp. 300–317). Ox-
ford University Press. Retrieved from
van de Schoot, R., Kaplan, D., Denissen, J., Asendorpf, J. B., Neyer, F. J., & Aken, M. A.
(2014). A gentle introduction to bayesian analysis: Applications to developmental
research. Child Development,85 (3), 842–860. Retrieved from
Van de Schoot, R., Winter, S., Ryan, O., Zondervan-Zwijnenburg, M., & Depaoli, S. (in
press). A systematic review of Bayesian papers in psychology: The last 25 years.
Psychological Methods.
Vanpaemel, W. (2010). Prior sensitivity in theory testing: An apologia for the Bayes
factor. Journal of Mathematical Psychology,54 , 491–498. Retrieved from http:// doi: doi:10.1016/
van Ravenzwaaij, D., Boekel, W., Forstmann, B. U., Ratcliff, R., & Wagenmakers, E.-
J. (2014). Action video games do not improve the speed of information processing
in simple perceptual tasks. Journal of Experimental Psychology: General,143 (5),
1794–1805. Retrieved from doi: 10.1037/
Verhagen, J., & Wagenmakers, E.-J. (2014). Bayesian tests to quantify the result of a
replication attempt. Journal of Experimental Psychology: General,143 (4), 14–57.
Retrieved from doi: 10.1037/a0036731
Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of pvalues.
Psychonomic Bulletin and Review,14 (5), 779–804. Retrieved from http://tinyurl
Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypoth-
esis testing for psychologists: A tutorial on the Savage–Dickey method. Cognitive
psychology,60 (3), 158–189. Retrieved from
doi: 10.1016/j.cogpsych.2009.12.001
Wagenmakers, E.-J., Love, J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., . .. Morey,
R. D. (this issue). Bayesian inference for psychology, Part II: Example applications
Psychonomic Bulletin & Review 21/28
with JASP. Psychonomic Bulletin and Review.
Wagenmakers, E.-J., Marsman, M., Jamil, T., Ly, A., Verhagen, J., Love, J., . .. Morey,
R. (this issue). Bayesian inference for psychology, Part I: Theoretical advantages and
practical ramifications. Psychonomic Bulletin and Review.
Wagenmakers, E.-J., Morey, R. D., & Lee, M. (2016). Bayesian benefits for the prag-
matic researcher. Current Directions in Psychological Science,25 (3). Retrieved from
Wagenmakers, E.-J., Verhagen, J., & Ly, A. (2015). How to quantify the evidence for the
absence of a correlation. Behavior research methods, 1–14.
Wetzels, R., Matzke, D., Lee, M. D., Rouder, J. N., Iverson, G. J., & Wagenmakers, E.-
J. (2011). Statistical evidence in experimental psychology: an empirical comparison
using 855 t-tests. Perspectives on Psychological Science,6(3), 291–298. Retrieved
from doi: 10.1177/1745691611406923
Winkler, R. L. (2003). An introduction to Bayesian inference and decision (2nd ed.). Holt,
Rinehart and Winston New York.
Further reading
In this Appendix, we provide a concise overview of 32 additional articles and books that
provide further discussion of various theoretical and applied topics in Bayesian inference.
For example, the list includes articles that editors and reviewers might consult as a refer-
ence while reviewing manuscripts that apply advanced Bayesian methods such as structural
equation models (Kaplan & Depaoli,2012), hierarchical models (Rouder & Lu,2005), linear
mixed models (Sorensen, Hohenstein, & Vasishth,2016), and design (i.e., power) analyses
(Schönbrodt et al.,2015). The list also includes books that may serve as accessible intro-
ductory texts (e.g., Dienes,2008) or as more advanced textbooks (e.g., Gelman et al.,
2013). To aid in readers’ selection of sources, we have summarized the associated focus and
difficulty ratings for each source in Figure A1.
Recommended articles
9. Cornfield (1966)— Sequential Trials, Sequential Analysis, and the Likelihood Prin-
ciple. Theoretical focus (3), moderate difficulty (5).
A short exposition of the difference between Bayesian and classical inference in se-
quential sampling problems.
10. Lindley (2000)— The Philosophy of Statistics. Theoretical focus (1), moderate
difficulty (5).
Dennis Lindley, a foundational Bayesian, outlines his philosophy of statistics, receives
commentary, and responds. An illuminating paper with equally illuminating com-
11. Jaynes (1986)— Bayesian Methods: General Background. Theoretical focus (2),
low difficulty (2).
Psychonomic Bulletin & Review 22/28
A brief history of Bayesian inference. The reader can stop after finishing the section
titled, “Is our logic open or closed,” because the further sections are somewhat dated
and not very relevant to psychologists.
12. Edwards, Lindman, and Savage (1963)— Bayesian Statistical Inference for Psy-
chological Research. Theoretical focus (2), high difficulty (9).
The article that first introduced Bayesian inference to psychologists. A challenging but
insightful and rewarding paper. Much of the more technical mathematical notation
can be skipped with minimal loss of understanding.
13. Rouder, Morey, and Wagenmakers (2016)— The Interplay between Subjectiv-
ity, Statistical Practice, and Psychological Science. Theoretical focus (2), low difficulty
All forms of statistical analysis, both Bayesian and frequentist, require some subjective
input (see also Berger & Berry,1988). In this article, the authors emphasize that
subjectivity is in fact desirable, and one of the benefits of the Bayesian approach is that
the inclusion of subjective elements is transparent and therefore open to discussion.
14. Myung and Pitt (1997)— Applying Occam’s Razor in Cognitive Modeling: A
Bayesian Approach. Balanced focus (5), high difficulty (9).
This paper brought Bayesian methods to greater prominence in modern psychology,
discussing the allure of Bayesian model comparison for non-nested models and pro-
viding worked examples. As the authors provide a great discussion of the principle of
parsimony, thus this paper serves as a good follow-up to our fifth highlighted source
(Vandekerckhove et al.,2015).
15. Wagenmakers, Morey, and Lee (2016)— Bayesian Benefits for the Pragmatic
Researcher. Applied focus (9), low difficulty (1).
Provides pragmatic arguments for the use of Bayesian inference with two examples
featuring fictional characters Eric Cartman and Adam Sandler. This paper is clear,
witty, and persuasive.
16. Rouder (2014)— Optional Stopping: No Problem for Bayesians. Balanced focus
(5), moderate difficulty (5).
Provides a simple illustration of why Bayesian inference is valid in the case of optional
stopping. A natural follow-up to our third highlighted source (Dienes,2011).
17. Verhagen and Wagenmakers (2014)— Bayesian Tests to Quantify the Result of
a Replication Attempt. Balanced focus (4), high difficulty (7).
Outlines so-called “replication Bayes factors,” which use the original study’s estimated
posterior distribution as a prior distribution for the replication study’s Bayes factor.
Given the current discussion of how to estimate replicability (Open Science Collabo-
ration,2015), this work is more relevant than ever. (See also Wagenmakers, Verhagen,
and Ly (2015) for a natural follow-up.)
Psychonomic Bulletin & Review 23/28
18. Gigerenzer (2004)— Mindless Statistics. Theoretical focus (3), low difficulty (1).
This paper constructs an enlightening and witty overview on the history and psychol-
ogy of statistical thinking. It contextualizes the need for Bayesian inference.
19. Ly et al. (2016)— Harold Jeffreys’s Default Bayes Factor Hypothesis Tests: Expla-
nation, Extension, and Application in Psychology. Theoretical focus (2), high difficulty
A concise summary of the life, work, and thinking of Harold Jeffreys, inventor of
the Bayes factor (see also Etz & Wagenmakers,in press). The second part of the
paper explains the computations in detail for t-tests and correlations. The first part
is essential in grasping the motivation behind the Bayes factor.
20. Robert (2014)— On the Jeffreys–Lindley Paradox. Theoretical focus (3), moderate
difficulty (6).
Robert discusses the implications of the Jeffreys–Lindley paradox, so-called because
Bayesians and frequentist hypothesis tests can come to diametric conclusions from the
same data—even with infinitely large samples. The paper further outlines the need
for caution when using improper priors, and why they present difficulties for Bayesian
hypothesis tests. (For more on this topic see DeGroot,1982).
21. Jeffreys (1936)— On Some Criticisms of the Theory of Probability. Theoretical
focus (1), high difficulty (8).
An early defense of probability theory’s role in scientific inference by one of the
founders of Bayesian inference as we know it today. The paper’s notation is some-
what outdated and makes for rather slow reading, but Jeffreys’s writing is insightful
22. Rouder, Morey, Verhagen, et al. (2016)— Is There a Free Lunch in Inference?
Theoretical focus (3), moderate difficulty (4).
A treatise on why making detailed assumptions about alternatives to the null hypoth-
esis is requisite for a satisfactory method of statistical inference. A good reference
for why Bayesians cannot do hypothesis testing by simply checking if a null value lies
inside or outside of a credible interval, and instead must calculate a Bayes factor to
evaluate the plausibility of a null model.
23. Berger and Delampady (1987)— Testing Precise Hypotheses. Theoretical focus
(1), high difficulty (9).
Explores the different conclusions to be drawn from hypothesis tests in the classi-
cal versus Bayesian frameworks. This is a resource for readers with more advanced
statistical training.
24. Wetzels et al. (2011)— Statistical Evidence in Experimental Psychology: An
Empirical Comparison using 855 t-tests. Applied focus (7), low difficulty (2).
Using 855 t-tests from the literature, the authors quantify how inference based on p
values, effect sizes, and Bayes factors differ. An illuminating reference to understand
the practical differences between various methods of inference.
Psychonomic Bulletin & Review 24/28
25. Vanpaemel (2010)— Prior Sensitivity in Theory Testing: An Apologia for the
Bayes Factor. Theoretical focus (3), high difficulty (7).
The authors defend Bayes factors against the common criticism that the inference is
sensitive to specification of the prior. They assert that this sensitivity is valuable and
26. Royall (2004)— The Likelihood Paradigm for Statistical Inference. Theoretical
focus (2), moderate difficulty (5).
An accessible introduction to the Likelihood principle, and its relevance to inference.
Contrasts are made among different accounts of statistical evidence. A more complete
account is given in Royall (1997).
27. Gelman and Shalizi (2013)— Philosophy and the Practice of Bayesian Statistics.
Theoretical focus (2), high difficulty (7).
This is the centerpiece of an excellent special issue on the philosophy of Bayesian
inference. We recommend that discussion groups consider reading the entire special
issue (British Journal of Mathematical and Statistical Psychology, February, 2013), as
it promises intriguing and fundamental discussions about the nature of inference.
28. Wagenmakers et al. (2010)— Bayesian Hypothesis Testing for Psychologists: A
Tutorial on the Savage-Dickey Ratio. Applied focus (9), moderate difficulty (6).
Bayes factors are notoriously hard to calculate for many types of models. This article
introduces a useful computational trick known as the “Savage-Dickey Density Ratio,”
an alternative conception of the Bayes factor that makes many computations more
convenient. The Savage-Dickey ratio is a powerful visualization of the Bayes factor,
and is the primary graphical output of the Bayesian statistics software JASP (Love
et al.,2015).
29. Gallistel (2009)— The Importance of Proving the Null. Applied focus (7), low
difficulty (3).
The importance of null hypotheses is explored through three thoroughly worked exam-
ples. This paper provides valuable guidance for how one should approach a situation
in which it is theoretically desirable to accumulate evidence for a null hypothesis.
30. Rouder and Lu (2005)— An Introduction to Bayesian Hierarchical Models with
an Application in the Theory of Signal Detection. Applied focus (7), high difficulty
This is a good introduction to hierarchical Bayesian inference for the more math-
ematically inclined readers. It demonstrates the flexibility of hierarchical Bayesian
inference applied to signal detection theory, while also introducing augmented Gibbs
31. Sorensen et al. (2016)— Bayesian Linear Mixed Models Using Stan: A Tutorial
for Psychologists. Applied focus (9), moderate difficulty (4).
Psychonomic Bulletin & Review 25/28
Using the software Stan, the authors give an accessible and clear introduction to
hierarchical linear modeling. Because both the paper and code are hosted on github,
this article serves as a good example of open, reproducible research in a Bayesian
32. Schönbrodt et al. (2015)— Sequential Hypothesis Testing with Bayes Factors:
Efficiently Testing Mean Differences. Applied focus (8), low difficulty (3).
For Bayesians, power analysis is often an afterthought because sequential sampling
is encouraged, flexible, and convenient. This paper provides Bayes factor simulations
that give researchers an idea of how many participants they might need to collect to
achieve moderate levels of evidence from their studies.
33. Kaplan and Depaoli (2012)— Bayesian Structural Equation Modeling. Applied
focus (8), high difficulty (7).
One of few available practical sources on Bayesian structural equation modeling. The
article focuses on the Mplus software but also stands a general source.
34. Rouder et al. (in press)— Bayesian Analysis of Factorial Designs. Balanced focus
(6), high difficulty (8).
Includes examples of how to set up Bayesian ANOVA models, which are some of the
more challenging Bayesian analyses to perform and report, as intuitive hierarchical
models. In the appendix, how to use the BayesFactor R package and JASP software
for ANOVA is demonstrated. The relatively high difficulty rating is due to the large
amount of statistical notation.
Recommended books
35. Winkler (2003)— Introduction to Bayesian Inference and Decision. Balanced focus
(4), low difficulty (3).
As the title suggests, this is an accessible textbook that introduces the basic concepts
and theory underlying the Bayesian framework for both inference and decision-making.
The required math background is elementary algebra (i.e., no calculus is required).
36. McElreath (2016)— Statistical Rethinking: A Bayesian Course with Examples in
R and Stan. Balanced focus (6), moderate difficulty (4).
Not your traditional applied introductory statistics textbook. McElreath focuses on
education through simulation, with handy R code embedded throughout the text to
give readers a hands-on experience.
37. Lee and Wagenmakers (2014)— Bayesian Cognitive Modeling: A Practical
Course. Applied focus (7), moderate difficulty (4).
A textbook on Bayesian cognitive modeling methods that is in a similar vein to our
eighth highlighted source (Lee,2008). It includes friendly introductions to core prin-
ciples of implementation and many case examples with accompanying MATLAB and
R code.
Psychonomic Bulletin & Review 26/28
38. Lindley (2006)— Understanding Uncertainty. Theoretical focus (2), moderate dif-
ficulty (4).
An introduction to thinking about uncertainty and how it influences everyday life
and science. Lindley proposes that all types of uncertainty can be represented by
probabilities. A largely non-technical text, but a clear and concise introduction to the
general Bayesian perspective on decision making under uncertainty.
39. Dienes (2008)— Understanding Psychology as a Science: An Introduction to Sci-
entific and Statistical Inference. Theoretical focus (1), low difficulty (3).
A book that covers a mix of philosophy of science, psychology, and Bayesian inference.
It is a very accessible introduction to Bayesian statistics, and it very clearly contrasts
the different goals of Bayesian and classical inference.
40. Stone (2013)— Bayes’ Rule: A Tutorial Introduction to Bayesian Analysis. Bal-
anced focus (4), moderate difficulty (6).
In this short and clear introductory text, Stone explains Bayesian inference using
accessible examples and writes for readers with little mathematical background. Ac-
companying Python and MATLAB code is provided on the author’s website.
Psychonomic Bulletin & Review 27/28
Figure A1 .An overview of focus and difficulty ratings for all sources included in the present
paper.Sources discussed at length in the Theoretical sources and Applied sources sections
are presented in bold text. Sources listed in the appended Further reading appendix are
presented in light text. Source numbers representing books are italicized.
Psychonomic Bulletin & Review 28/28
... Para incorporar la multiplicidad de variables generadas y generar parámetros comprensivos para interpretar el resultado de los análisis descritos, la porción final del ejercicio estadístico se llevará desde el paradigma estadístico Bayesiano (Buck y Litton, 1999). En términos generales, desde un cuerpo de conocimiento previo, cotejado con la nueva evidencia, se genera un porcentaje de probabilidad de distribuciones o relación entre las variables propuestas, considerando para esta estimación solo las observaciones pertenecientes a la muestra recuperada sin intentar definir relaciones no observadas (Etz et al. 2018). En nuestro caso, permitiría identificar por medio de modelamiento matemático la probabilidad de que para cada conjunto dado de trazas pertenezca a un segmento específico de posicionamiento regional, siendo en teoría posible identificar settings tafonómicos sensibles a atributos de posición geográfica a través del cálculo de la probabilidad real de observar un conjunto de variables en cualquiera de las variantes especificadas de posición (Araujo-Junior et al. 2017;Borrero, 2001aBorrero, , 2001bBelardi y Carballo, 2003). ...
... Los modelos bayesianos comienzan con creencias expresadas en distribuciones a priori, y se actualizan para describir distribuciones a posteriori, las cuales son la base para las interpretaciones. Su mayor virtud para nuestro caso es su capacidad de generar predicciones, mediante la relación entre la incertidumbre a priori con la distribución a posteriori (Buck y Litton, 1999;Etz et al. 2018). Esto implica construir tres valores específicos, la distribución de variables a priori, la verosimilitud (likelihood), y la distribución a posteriori (Buck y Litton, 1999). ...
... Este último paso provee un mecanismo coherente, iterativo y replicable (Etz et al. 2018) para obtener información en base al continuo de acumulaciones de conocimiento acerca de un fenómeno que involucra variables cuya interacción no comprendemos o no es posible conocer del todo debido a sesgos muestrales inevitables (como el reciclaje de material biológico sensu Behrensmeyer, Kidwell y Gastaldo, 2000). En arqueología, donde la jerarquización de la interdependencia entre variables provenientes de un rango amplio de disímiles líneas de evidencia es una labor implícita, el método bayesiano no es más que un marco de referencia explícito de combinar esas líneas de evidencia para ofrecer interpretaciones que reflejen. ...
Full-text available Las discusiones sobre la intensificación en el uso de recursos marítimos por las comunidades arcaicas que habitaron las costas arréicas del desierto de Atacama en la zona de Taltal, descansan mayoritariamente en interpretaciones del registro ictiológico y malacológico, ante la escasa presencia relativa de restos de mamíferos y aves en los sitios estudiados. Aún a pesar de que la evidencia general tiende a confirmar esta hipótesis de intensificación, poca atención se ha puesto en intentar explicar estas diferencias de representación de clases animales en los conjuntos zooarqueológicos. Considerando que estos sitios se insertan en uno de los ecosistemas más activos y dinámicos en términos de reciclaje biológico, mediada por numerosos agentes que intervienen, seleccionan y acumulan restos óseos. Ante esta incertidumbre de potencial sesgo en el registro de tetrápodos para los sitios en Taltal, se elaboró un estudio tafonómico regional a partir de un conjunto actualístico, el cual comparó múltiples variables del registro óseo depositado hoy en las costas desérticas con el proveniente de sitios arqueológicos desde los 5000 cal. A.P. en adelante en la misma zona, información que procesada por medio de un modelo clasificador bayesiano permitió constatar sesgos producto de alteración tafonómica y determinar con precisión aquellos sitios arqueológicos mas proclives a problemas interpretativos producto de dichos sesgos.
... NHST procedures do not allow to quantify evidence in favour of the null hypothesis: one can only ever reject a null hypothesis or withhold judgement, but never accept it 5 (Dienes, 2011;Etz, Gronau, Dablander, Edelsbrunner, & Baribault, 2017;Gigerenzer, 2004;Rouder, Speckman, Sun, Morey, & Iverson, 2009;Wagenmakers, 2007;Wetzels et al., 2011). However, within Bayesian inferential statistics, a measure is provided that can quantify whether the collected evidence is more consistent with the null hypothesis (H 0 ) or the alternative hypothesis (H 1 ): the Bayes factor (BF). ...
... Choosing a narrower distribution (i.e., a lower value of r), would result in a high level of similarity between H 1 and H 0 , making the tests uninformative. Conversely, choosing an unreasonably wide distribution would result in the BF favouring the null too heavily, by placing too much weight on extreme effect size values (Etz et al., 2017;Rouder et al., 2009;Wagenmakers et al., 2017). In addition to calculating the BF, robustness checks and sequential analyses (combined with a robustness analysis) of the BF were carried out and visualized in plots. ...
... This effect is very robust, since both parametrical and non-parametrical statistical procedures converged on similar results and were additionally confirmed by Bayes factor analyses. Importantly, by performing Bayesian factor analyses, we were not only able to quantify the amount of evidence in favour of the effect under scrutiny, but we could also quantify the evidence in favour of the null hypothesis, something that cannot be achieved by means of NHST (Dienes, 2011;Etz et al., 2017;Rouder et al., 2009;Wagenmakers, 2007). Although the evidence in favour of a bias was very strong in the cued conditions, the evidence in favour of no effect in the no cue condition seemed weaker. ...
Full-text available
In the literature, there is an ongoing debate regarding the nature of attentional orienting towards non-reportable exogenous cues. Some argue that even though bottom-up orienting can occur towards conscious stimuli, it is consistently modulated by endogenous factors in the case of unconscious stimuli. This would suggest that there may be no purely exogenous shifts of attention towards unconscious stimuli. In this thesis, we set out to provide compelling evidence for an automatic nature of attentional orienting towards non-reportable cues, independent from endogenous factors (e.g., attentional task set). To investigate this, an experiment employing the temporal order judgement (TOJ) paradigm was conducted, in which two line gratings of opposite orientation were presented on each side of a fixation, separated by various stimulus onset asynchronies (SOAs). Participants were required to report the orientation of the line grating that was presented first. In two-thirds of the trials, a non-reportable exogenous cue was presented on the opposite location of the first line grating, making it counterproductive to attend to the cue. Cue awareness was assessed in addition to performance on the TOJ task. Data were analysed using parametric and non-parametric procedures, supplemented by Bayes factor analyses. Results from these procedures converged in showing a robust bias towards the cued line gratings, suggesting that bottom-up orienting towards non-reportable exogenous cues occurs independently from attentional task set.
... Sample size planning based on power analysis is not relevant because we will use Bayesian estimation and hypothesis testing for statistical analysis (Etz et al., 2016;Wagenmakers et al., 2016). In this statistical framework, power is not conceptualized because hypothesis testing is not based on an inferential framework but on continuous evaluation of evidence (Schönbrodt and Wagenmakers, 2016). ...
Research Proposal
Full-text available
Scientific thinking is a predicate for scientific inquiry, and thus important to develop early in psychology students as potential future researchers. The present research is aimed at fathoming the contributions of formal and informal learning experiences to psychology students' development of scientific thinking during their 1st-year of study. We hypothesize that informal experiences are relevant beyond formal experiences. First-year psychology student cohorts from various European countries will be assessed at the beginning and again at the end of the second semester. Assessments of scientific thinking will include scientific reasoning skills, the understanding of basic statistics concepts, and epistemic cognition. Formal learning experiences will include engagement in academic activities which are guided by university authorities. Informal learning experiences will include non-compulsory, self-guided learning experiences. Formal and informal experiences will be assessed with a newly developed survey. As dispositional predictors, students' need for cognition and self-efficacy in psychological science will be assessed. In a structural equation model, students' learning experiences and personal dispositions will be examined as predictors of their development of scientific thinking. Commonalities and differences in predictive weights across universities will be tested. The project is aimed at contributing information for designing university environments to optimize the development of students' scientific thinking.
... This notably includes the Transparency and Openness Promotion (TOP) Guidelines (Nosek et al., 2015) ; Plan S and cOAlition S (Plan S, 2020) ; and the San Francisco Declaration of Researchers Assessment (DORA, 2020) . Self-organising initiatives have also produced practical guides to further facilitate the adoption of open research into existing workflows C. Allen & Mehler, 2019;Button et al., 2020;Crüwell et al., 2019;DeBruine & Barr, 2019;Etz et al., 2018;Kathawalla et al., 2020;Klein et al., 2018;McKiernan et al., 2016;Munafò et al., 2017;Sarabipour et al., 2019) . However, despite widespread support, wholesale adoption of open research remains elusive, with early career researchers in the psychological sciences being the notable exception (Abele-Brehm et al., 2019;Ali-Khan et al., 2017;Houtkoop et al., 2018) . ...
Full-text available
Increasingly, policies are being introduced to reward and recognise open research practices, while the adoption of such practices into research routines is being facilitated by many grassroots initiatives. However, despite this widespread endorsement and support, open research is yet to be widely adopted, with early career researchers being the notable exception. For open research to become the norm, initiatives should engage academics from all career stages, particularly senior academics (namely senior lecturers, readers, professors) given their routine involvement in determining the quality of research. Senior academics, however, face unique challenges in implementing policy change and supporting grassroots initiatives. Given that-like all researchers-senior academics are motivated by self-interest, this paper lays out three feasible steps that senior academics can take to improve the quality and productivity of their research, that also serve to engender open research. These steps include a) change hiring criteria, b) change how scholarly outputs are credited, and c) change to funding and publishing with open research. The guidance we provide is accompanied by live, crowd-sourced material for further reading.
... SDT models have been applied using both classical (Macmillan & Creelman, 2005) and Bayesian frameworks (Rouder & Lu, 2005). In this paper, we adopt the Bayesian framework (Etz et al., 2016;Lee & Wagenmakers, 2013). An important goal of Bayesian statistics is to determine the posterior distribution of the parameters. ...
Full-text available
Signal detection theory (SDT) is used to quantify people’s ability and bias in discriminating stimuli. The ability to detect a stimulus is often measured through confidence ratings. In SDT models, the use of confidence ratings necessitates the estimation of confidence category thresholds, a requirement that can easily result in models that are overly complex. As a parsimonious alternative, we propose a threshold SDT model that estimates these category thresholds using only two parameters. We fit the model to data from Pratte et al. (Journal of Experimental Psychology: Learning, Memory, and Cognition, 36, 224–232 2010) and illustrate its benefits over previous threshold SDT models.
Full-text available
Memory consolidation has been mainly investigated for extended periods, from hours to days. Recent studies focused on memory consolidation occurring within shorter periods, from seconds to minutes. Yet, these studies focused on explicit sequence learning with fixed rest periods. Our study aimed at determining whether short rest periods enhance implicit probabilistic sequence learning and whether the length of rest duration influences such offline changes. Participants performed an implicit probabilistic sequence learning task throughout 45 blocks. Between blocks, participants were allowed to rest and then to continue the task at their pace. The results show that probabilistic sequence knowledge decreased from pre-to post-rest periods, and this decrement was not related to the length of rest duration. These results suggest that probabilistic sequence knowledge decays during short rest periods and that such forgetting is not time-dependent. Overall, our findings highlight that ultra-fast consolidation differently affects distinct cognitive processes.
This series of studies applies classical experimental designs to eye tracking measurement. The field of study is the attention for sustainability-related information in tourism products. Data show that sustainability labels alone receive relatively little attention in a realistic environment. As a result, it seems advisable to think about additional ways to relate sustainability information to consumers. It could be shown that implicit information, again, yields a higher share of attention than labels. Therefore, the design and informational transmittance of products combining sustainability and experiential value to the customer seem to be worthwhile as one of those alternatives. Care should be taken of the price argument because attention towards prices rises as soon as sustainability information becomes available. Data do not suggest to dispense with ecolabels. They do suggest, however, that a change in informational environment (i.e. directing consumers to sustainability issues) and the additional use of experience-related information aspects would increase the attention for sustainability information in tourism.
Full-text available
Objective We aim to evaluate the impact of zonisamide (ZNS) compared to topiramate (TPM) on cognition in patients with epilepsy. Although the risk of cognitive side effects has been clearly demonstrated for TPM, comparable side effects in ZNS have been suggested but evidence from studies is inconclusive. Methods In this retrospective observational study, we analyzed patients’ records from before and after introduction or withdrawal of ZNS vs TPM. Data were gathered during routine clinical care protocols. Standardized monitoring of executive functions (EpiTrack), verbal memory (short version of verbaler lern‐ und merkfähigkeitstest, VLMT), and subjective health (extended Adverse Events Profile; quality of life in epilepsy inventory, QOLIE‐10) was performed in 73 patients when TPM (n = 45) or ZNS (n = 28) was introduced and 62 patients when TPM (n = 29) or ZNS (n = 33) was withdrawn. The data were analyzed using Bayes statistics that quantify evidence for or against an effect through Bayes factors (BFs). Results There was decisive evidence for a negative effect of adjunctive ZNS and TPM on executive function (BF = 965.08) and a positive effect of their withdrawal (BF = 429.51). The ZNS effect seemed smaller, although the difference was inconclusive. Verbal memory and subjective quality of life were not significantly affected. Subjectively, ZNS was connected to lower anxiety and fewer headaches, whereas TPM had a perceived effect on weight, fluent speech and comprehension, headaches, and balance. Significance This is the first study to provide objective evidence for a considerable negative effect of ZNS treatment on executive function in a naturalistic treatment setting. Comparable to the well‐known TPM effect, cognition worsens with adjunction and recovers with withdrawal of ZNS. However, the majority of patients do not show a significant negative effect, suggesting disparate susceptibilities to adverse events. The findings emphasize the need for routine monitoring of cognitive side effects to identify early on those patients who are negatively affected by new AED.
Truth tellers provide less detail in delayed than in immediate interviews (likely due to forgetting), whereas liars provide similar amounts of detail in immediate and delayed interviews (displaying a metacognitive stability bias effect). We examined whether liar's flawed metacognition after delays could be exploited by encouraging interviewees to provide more detail via a Model Statement. Truthful and deceptive participants were interviewed immediately (n = 78) or after a three-week delay (n = 78). Half the participants in each condition listened to a Model Statement before questioning. In the Immediate condition, truth tellers provided more details than liars. This pattern was unaffected by the Model Statement. In the Delayed condition, truth tellers and liars provided a similar amount of detail in the Model Statement-absent condition, whereas in the Model Statement-present condition, liars provided more details than truth tellers.
Assuring the design and maintenance of complex systems is itself a complex undertaking. Claims regarding the suitability of a design for a particular purpose need to be supported by evidence of design accomplishment and process compliance. The Goal Structuring Notation can be used to support argumentation of particular claims, including system safety and system assurance claims. While argument claims need to be supported by evidence, the strength of the evidence supporting a claim is often not systematically considered. Restructuring argument claims as a Bayesian Network supports systematically quantifying the likelihood of the veracity of evidence supporting the claim. Using an Object-based Bayesian Network allows this model to be rapidly expanded to consider multiple systems of systems. The application of this method to a simple Technical Integrity assurance argument is shown. Similarly, the technique can be used to support a safety argument.
Full-text available
In the psychological literature, there are two seemingly different approaches to inference: that from estimation of posterior intervals and that from Bayes factors. We provide an overview of each method and show that a salient difference is the choice of models. The two approaches as commonly practiced can be unified with a certain model specification, now popular in the statistics literature, called spike-and-slab priors. A spike-and-slab prior is a mixture of a null model, the spike, with an effect model, the slab. The estimate of the effect size here is a function of the Bayes factor, showing that estimation and model comparison can be unified. The salient difference is that common Bayes factor approaches provide for privileged consideration of theoretically useful parameter values, such as the value corresponding to the null hypothesis, while estimation approaches do not. Both approaches, either privileging the null or not, are useful depending on the goals of the analyst.
Full-text available
Bayesian parameter estimation and Bayesian hypothesis testing present attractive alternatives to classical inference using confidence intervals and p values. In part I of this series we outline ten prominent advantages of the Bayesian approach. Many of these advantages translate to concrete opportunities for pragmatic researchers. For instance, Bayesian hypothesis testing allows researchers to quantify evidence and monitor its progression as data come in, without needing to know the intention with which the data were collected. We end by countering several objections to Bayesian hypothesis testing. Part II of this series discusses JASP, a free and open source software program that makes it easy to conduct Bayesian estimation and testing for a range of popular statistical scenarios (Wagenmakers et al. this issue).
Full-text available
Bayesian hypothesis testing presents an attractive alternative to p value hypothesis testing. Part I of this series outlined several advantages of Bayesian hypothesis testing, including the ability to quantify evidence and the ability to monitor and update this evidence as data come in, without the need to know the intention with which the data were collected. Despite these and other practical advantages, Bayesian hypothesis tests are still reported relatively rarely. An important impediment to the widespread adoption of Bayesian tests is arguably the lack of user-friendly software for the run-of-the-mill statistical problems that confront psychologists for the analysis of almost every experiment: the t-test, ANOVA, correlation, regression, and contingency tables. In Part II of this series we introduce JASP (, an open-source, cross-platform, user-friendly graphical software package that allows users to carry out Bayesian hypothesis tests for standard statistical problems. JASP is based in part on the Bayesian analyses implemented in Morey and Rouder’s BayesFactor package for R. Armed with JASP, the practical advantages of Bayesian hypothesis testing are only a mouse click away.
Full-text available
The marginal likelihood plays an important role in many areas of Bayesian statistics such as parameter estimation, model comparison, and model averaging. In most applications, however, the marginal likelihood is not analytically tractable and must be approximated using numerical methods. Here we provide a tutorial on bridge sampling (Bennett, 1976; Meng & Wong, 1996), a reliable and relatively straightforward sampling method that allows researchers to obtain the marginal likelihood for models of varying complexity. First, we introduce bridge sampling and three related sampling methods using the beta-binomial model as a running example. We then apply bridge sampling to estimate the marginal likelihood for the Expectancy Valence (EV) model---a popular model for reinforcement learning. Our results indicate that bridge sampling provides accurate estimates for both a single participant and a hierarchical version of the EV model. We conclude that bridge sampling is an attractive method for mathematical psychologists who typically aim to approximate the marginal likelihood for a limited set of possibly high-dimensional models.
Statistical methods aim to answer a variety of questions about observations. A simple example occurs when a fairly reliable test for a condition or substance, C, has given a positive result. Three important types of questions are: Should this observation lead me to believe that C is present? Does this observation justify my acting as if C were present? Is this observation evidence that C is present? This chapter distinguishes among these three questions in terms of the variables and principles that determine their answers. It then uses this framework to understand the scope and limitations of current methods for interpreting statistical data as evidence. By “statistical evidence,” we mean observations that are interpreted under a probability model. Questions of the third type, concerning the evidential interpretation of statistical data, are central to many applications of statistics in science. The chapter shows that for answering them, current statistical methods are seriously flawed. It looks for the source of the problems and proposes a solution based on the law of likelihood.
Statistical Rethinking: A Bayesian Course with Examples in R and Stan builds readers’ knowledge of and confidence in statistical modeling. Reflecting the need for even minor programming in today’s model-based statistics, the book pushes readers to perform step-by-step calculations that are usually automated. This unique computational approach ensures that readers understand enough of the details to make reasonable choices and interpretations in their own modeling work. The text presents generalized linear multilevel models from a Bayesian perspective, relying on a simple logical interpretation of Bayesian probability and maximum entropy. It covers from the basics of regression to multilevel models. The author also discusses measurement error, missing data, and Gaussian process models for spatial and network autocorrelation. By using complete R code examples throughout, this book provides a practical foundation for performing statistical inference. Designed for both PhD students and seasoned professionals in the natural and social sciences, it prepares them for more advanced or specialized statistical modeling. Web Resource The book is accompanied by an R package (rethinking) that is available on the author’s website and GitHub. The two core functions (map and map2stan) of this package allow a variety of statistical models to be constructed from standard model formulas.
We demonstrate the use of three popular Bayesian software packages that enable researchers to estimate parameters in a broad class of models that are commonly used in psychological research. We focus on WinBUGS, JAGS, and Stan, and show how they can be interfaced from R and MATLAB. We illustrate the use of the packages through two fully worked examples; the examples involve a simple univariate linear regression and fitting a multinomial processing tree model to data from a classic false-memory experiment. We conclude with a comparison of the strengths and weaknesses of the packages. Our example code, data, and this text are available via
This article explains the foundational concepts of Bayesian data analysis using virtually no mathematical notation. Bayesian ideas already match your intuitions from everyday reasoning and from traditional data analysis. Simple examples of Bayesian data analysis are presented that illustrate how the information delivered by a Bayesian analysis can be directly interpreted. Bayesian approaches to null-value assessment are discussed. The article clarifies misconceptions about Bayesian methods that newcomers might have acquired elsewhere. We discuss prior distributions and explain how they are not a liability but an important asset. We discuss the relation of Bayesian data analysis to Bayesian models of mind, and we briefly discuss what methodological problems Bayesian data analysis is not meant to solve. After you have read this article, you should have a clear sense of how Bayesian data analysis works and the sort of information it delivers, and why that information is so intuitive and useful for drawing conclusions from data.
We present a case study of hierarchical Bayesian explanatory cognitive psychometrics, examining information processing characteristics of individuals with high-functioning autism spectrum disorder (HFASD). On the basis of previously published data, we compare the classification behavior of a group