ArticlePDF Available

Abstract and Figures

Considerable prior statistical work has criticized replacing a continuously measured variable in a general linear model with a dichotomy based on a median split of that variable. Iacobucci, Posovac, Kardes, Schneider, and Popovich (this issue) defend the practice of “median splits” using both conceptual arguments and simulations. We dispute their conceptual arguments, and we have identified technical errors in their simulations that dramatically change the conclusions that follow from those simulations. We show that there are no real benefits to median splits, and there are real costs in increases in Type II errors through loss of power and increases in Type I errors through false positive consumer psychology. We conclude that median splits remain a bad idea.
Content may be subject to copyright.
Research Dialogue
Median splits, Type II errors, and falsepositive consumer psychology:
Don't ght the power
Gary H. McClelland
a,
, John G. Lynch, Jr.
b
, Julie R. Irwin
c
,
Stephen A. Spiller
d
, Gavan J. Fitzsimons
e
a
Department of Psychology and Neuroscience, University of ColoradoBoulder, United States
b
Leeds School of Business, University of ColoradoBoulder, United States
c
McCombs School of Business, University of TexasAustin, United States
d
Anderson School of Management, UCLA, United States
e
Fuqua School of Business, Duke University, United States
Received 30 April 2015; accepted 3 May 2015
Abstract
Considerable prior statistical work has criticized replacing a continuously measured variable in a general linear model with a dichotomy based
on a median split of that variable. Iacobucci, Posovac, Kardes, Schneider, and Popovich (2015- this issue) defend the practice of median splits
using both conceptual arguments and simulations. We dispute their conceptual arguments, and we have identied technical errors in their
simulations that dramatically change the conclusions that follow from those simulations. We show that there are no real benets to median splits,
and there are real costs in increases in Type II errors through loss of power and increases in Type I errors through falsepositive consumer
psychology. We conclude that median splits remain a bad idea.
© 2015 Society for Consumer Psychology. Published by Elsevier Inc. All rights reserved.
Keywords: Median splits; Statistical power; Falsepositive psychology
Introduction
Researchers can make Type I or Type II errors, rejecting a
true null hypothesis, or failing to reject a false null hypothesis.
In the same way, journals can make two kinds of errors,
rejecting a paper that is later concluded to be insightful or
publishing a paper that is later concluded not to be true. For
instance, Gans and Shepherd (1994) reviewed famous econom-
ics papers that were rejected multiple times before being
published and regarded as great. George Akerlof's (1970) A
Market for Lemonspaper was rejected by the American
Economic Review, the Journal of Political Economy, and the
Review of Economic Studies. Two said it was trivial, the other
that it was too general to be true. Those journals made a Type II
error. Ackerlof later won the Nobel Prize in economics for the
work. In other cases, a prestigious journal publishes a sensa-
tional result that seems too good to be true and is later dis-
credited, reflecting a Type I error. Prominent examples are cold
fusion claims by Fleischmann and Pons (1989) and Bem's
(2011) finding of correct prediction of events in the future
(i.e. ESP). Both were followed by numerous failures to
replicate, and in the case of Bem, detailed critiques of the
statistical analysis by the editor who had accepted the original
paper (Judd, Westfall, & Kenny, 2012).
The paper by Iacobucci, Posovac, Kardes, Schneider, and
Popovich (2015- this issue, hereafter IPKSP) may fall within the
latter category. These authors make conceptual arguments and
present statistical simulations about the consequences of median
splits of continuous independent variables in linear models. Later
Corresponding author at: Dept of Psychology and Neuroscience, 345 UCB,
University of Colorado Boulder, Boulder, CO 80309-0345, United States.
E-mail address: gary.mcclelland@colorado.edu (G.H. McClelland).
http://dx.doi.org/10.1016/j.jcps.2015.05.006
1057-7408/© 2015 Society for Consumer Psychology. Published by Elsevier Inc. All rights reserved.
Please citethis article as: McClelland, G.H., et al., Median splits, Type II errors, and falsepositive consumer psychology: Don't ght the power, Journal of Consumer
Psychology (2015), http://dx.doi.org/10.1016/j.jcps.2015.05.006
Available online at www.sciencedirect.com
ScienceDirect
Journal of Consumer Psychology xx, x (2015) xxx xxx
JCPS-00489; No. of pages: 11; 4C:
in this commentary, we point out technical errors in their statistical
simulations. The actual programming code in Appendix A of
IPKSP does not match the description in the text of their paper, and
the result is that the simulations do not support the conclusions
IPKSP wish to draw. Consequently, the bulk of the contribution of
their paper must stand or fall on their conceptual arguments for the
appropriateness of median splits, which we argue are often
misguided. We first evaluate their conceptual arguments and
present conceptual arguments of our own, then present our
reanalysis and interpretation of their simulation results.
The topic of categorizing continuous predictor variables by
splitting them at their median has been covered extensively,
including in our own papers (e.g., Cohen, 1983; DeCoster,
Iselin, & Gallucci, 2009; Fitzsimons, 2008; Humphreys, 1978;
Humphreys & Fleishman, 1974;Irwin & McClelland, 2003;
MacCallum, Zhang, Preacher, & Rucker, 2002; Maxwell &
Delaney, 1993). We know of no statistical argument in favor of
median splits to counterbalance the chorus of statistical cri-
tiques against them. Because there is a danger that IPKSP may
convince researchers to use median splits, we briefly present
the arguments against their claims.
Our commentary will proceed as follows. First we will very
briefly present the core statistical reasons why median splits are
to be avoided. Second, we will review nonstatistical justifica-
tions for median splits presented by IPKSPincluding the
argument that median splits are conservative”—and will show
that there are ready answers for those justifications. Then we
will discuss in more depth the statistical considerations for
when median splits affect Type II errors, adversely affecting
power. In our view, power is the most compelling reason to
avoid median splits. We will address the conservatism defense
in that section, where we will show that steps that lower the
power of reports of significant findings in a journal increase the
percent of published results that are Type I errors. Finally, we
will address the discrepancies between the actual programming
code in IPKSP's Appendix A and the descriptions in the body
of IPKSP's paper and show how those discrepancies invalidate
the conclusions drawn by IPKSP.
The statistical case against median splits in a nutshell
We highlight the statistical case against median splits in
a simple design with a dependent variable Y and a single
measured independent variable X. We later consider multiple
independent variables in our reanalysis of IPKSP's simulations.
Assume X is an indicator of some latent construct and that the
observed X is linearly related to the underlying construct. By
splitting the measured X at its median, one replaces X with a
categorical variable X(e.g., 1 = greater than median, 0 = less
than or equal to the median). There are four main consequences
of this substitution, discussed in detail below:
a. This substitution introduces random error in the measure of the
latent construct and all of the problems that adding error brings.
b. The analysis now is insensitive to the pattern of local
covariation between X and Y within groups defined by the
median split. All that matters is the mean difference.
c. This analysis involves a nonlinear transformation of the
original X to a step function of the original X on the
dependent variable Y. The use of a median split on X makes
it impossible to test a substantive theoretical claim of a step
function relation of latent X to dependent variable Y.
d. If one believes that there is a step function relation of latent
X to the dependent variable Y, the threshold of that function
is presumably general and not sample-dependent. A median
split is sample-dependent.
a. Errors in variables
Introducing random error has two interrelated negative con-
sequences. First, when there is a nonzero population correlation
between X and Y, the correlation between the median split X
and Y will be lower in expectation, though adding error can
make the correlation higher in a subset of samples. Also,
splitting at the median makes the measure of the latent construct
underlying X noisier. Expected effect size goes down, and
statistical power is a function of effect size.
Adding random error to one's measure of X creates errors
in variablesin regression models, a source of bias in estimated
(standardized) coefficients. Since multiple regression models
assume errorless measurement of the latent constructs under-
lying X, adding error via median split creates inconsistent
estimates of the standardized coefficient (i.e., estimates that do
not have expected value equal to the true parameter). We will
demonstrate that this practice is hazardous, not conservative
as IPKSP maintain. It is surprising to us that Iacobucci,
Saldanha, and Deng (2007) have argued so eloquently about
the negative consequences of ignoring errors in variables in
statistical mediation analysis, but in the current paper IPKSP
defend the deliberate adding of measurement error to an
independent variable.
b. Ignoring information about local within-group covariation
between X and Y
Consider a simple regression of Y on continuously measured
X, and a reanalysis of the same data replacing X with Xdefined
by a median split. The analysis using median splits is insensitive
to the pattern of local covariation between Y and the continuous
X within the above-median and below-median groups. The
analysis using the continuously measured X is sensitive to that
within-group covariation. As a thought experiment, imagine
holding constant the univariate distributions of X and Y above
and below the median, but scrambling the pairings of X and Y
within the subsets of points above and below the median.
Different scrambles produce widely different slopes of the
regression of Y on continuous X, some significant, some not,
but identical slopes of the regression of Y on X. Thus, it is
untrue that it is uniformly conservative to use the median split.
In some cases the tstatistics from the median split can be more
significant than the tstatistics from regressing Y on continuous
X, and in most cases less significant. Such inconsistencies could
allow unscrupulous researchers to pick whichever outcome was
more favorable, as we discuss in more detail later.
2G.H. McClelland et al. / Journal of Consumer Psychology xx, x (2015) xxxxxx
Please citethis article as: McClelland, G.H., et al., Median splits, Type II errors, and falsepositive consumer psychology: Don't ght the power, Journal of C onsumer
Psychology (2015), http://dx.doi.org/10.1016/j.jcps.2015.05.006
c. Nonlinear transformation of X implies a step-function form
of the XY relation
Median splits produce a nonlinear transformation of con-
tinuous X (that is linearly related to the latent X construct) to a
crude step function relating latent X to X. Thus, if continuous
X was linearly related to Y, the use of a median split Xis
equivalent to revising one's prediction to be that the continuous
X has a step function relation to Y, and that the step happens
right at the median. Using the dichotomized Xmeasure instead
of the original X is the same as predicting that all values of X
below the median lead to the same value of Y and all values of
X above the median lead to a second value of Y.
Is this step function model an improvement or a distortion?
That depends on the true relationship between X and Y within
each group. As noted by DeCoster et al. (2009), if one believes
the theoretical relationship between X and Y is a step function,
it is inappropriate to dichotomize. With only two data points, it
is impossible to test the substantive theoretical claim that the
latent construct underlying X is categorical and binary. If la-
tent X is categorical, one should test the assumption via
polynomials, other forms of nonlinear regression or latent class
models (DeCoster et al., 2009). With kdiscrete levels of a
continuum, it is possible to test the substantive threshold claim
that a) a dummy or contrast variable for whether a level is
above or below the threshold is highly significant and b) a set
of k-2 dummies or contrasts measuring variations in X above
or below the threshold do not explain significant remaining
variance (Brauer & McClelland, 2005; Keppel & Wickens,
2004, p. 104).
d. Sample dependence of the claimed step function XY relation
Median splits represent a special case of replacing a con-
tinuous X with a dichotomized X. Suppose that one had strong
theoretical reasons to believe that the function relating measured
X and Y was a step function with threshold X
threshold
. That
threshold would presumably be the same no matter whether the
sample of respondents in the study was drawn from a
subpopulation likely to be high or low on the original X.
Median splits impose a sample-dependent threshold. There is
no compelling theoretical argument underlying the implicitly
claimed cut-point in a particular sample of data, when the
cut-point is always the median of that particular sample. Spiller,
Fitzsimons, Lynch, and McClelland (2013, p. 282) make a
similar critique of relying on sample-dependent tests, citing
Frederick's (2005) Cognitive Reflection Test with scores
ranging 0 to 3. Results from a Massachusetts Institute of
Technology (MIT) sample had M = 2.18, SD = .94, and results
from a University of Toledo sample had M = .57, SD = .87. A
lowMIT score would be similar to a highUniversity of
Toledo score.
IPKSP offer nonstatistical arguments for using median splits
in some situations and statistical arguments that turn on how
median splits affect Type I and Type II errors. We first consider
the nonstatistical arguments.
IPKSP's nonstatistical arguments for using median splits
Nonstatistical argument 1: Because median splits are popular
and appear in the best journals they should be seriously
considered as candidates for data analysis
We agree that median splits are popular. We would argue
that the popularity of the practice hurts science. In their
amusingly-titled chapter, Chopped Liver? Ok. Chopped
Data? Not OK,Butts and Ng (2009) bemoan this fact: it
should follow that if researchers are aware of the disadvan-
tages of using chopped data and regard the practice as poor
science, it should not occur with much frequency in articles
published in high-quality journals.
As an example of the pitfalls of the popularity argument,
Mani, Mullainathan, Shafir, and Zhao (2013a) published a
high-profile paper in Science concluding that poverty impedes
cognitive functioning. These authors reported experiments
inwhichrespondentswereaskedtosayhowtheywould
cope with various financial shocks and then their cognitive
functioning was measured. The key independent variables
were a) the size of the shock manipulated between subjects and
b) income. Income was measured continuously and subjected
to median splits. Across three laboratory experiments, the key
result was that measured cognitive functioning showed an
interaction between the size of the shock and income. Thinking
about coping with larger shocks inhibited subsequent cognitive
functioning in the low-income group but not the high-income
group. The same result did not obtain in a fourth study with
nonfinancial scenarios.
The next issue of Science printed a criticism of those
findings by Wicherts and Scholten (2013).Theyreportedthat
when the dichotomized indicators were replaced by the
original continuous variables, the critical interactions were
not significant at p b.05 in any of the three core studies:
p values were .084, .323, and .164. In a reply to Wicherts
and Scholten, Mani, Mullainathan, Shafir, and Zhao (2013b)
justified their use of median splits by citing papers published in
Science and other prestigious journals that also used median
splits. This Officer, other drivers were speeding toodefense
is often tried but rarely persuasive, especially here when the
results of the (nonsignificant) continuous analyses were
known. Though Mani et al. further noted their effect reached
the .05 level if one pooled the three studies, we would guess
that the editor poured himself or herself a stiff drink the night
after reading Wicherts and Scholten's critique and the Mani
et al. reply. It is hard to imagine that Science or many less
prestigious journals would have published the paper had the
authors initially reported the correct analyses with a string of
three nonsignificant findings conventionally significant only
by meta-analysis at the end of the paper. The reader
considering the use of median splits should consider living
through a similarly deflating experience. Splitting the data at
the median resulted in an inaccurate sense of the magnitude of
the fragile and small interaction effect (in this case, an
interaction that required the goosing of a meta-analysis to
reach significance), and a publication that was unfortunately
subject to easy criticism.
3G.H. McClelland et al. / Journal of Consumer Psychology xx, x (2015) xxxxxx
Please citethis article as: McClelland, G.H., et al., Median splits, Type II errors, and falsepositive consumer psychology: Don't ght the power, Journal of Consumer
Psychology (2015), http://dx.doi.org/10.1016/j.jcps.2015.05.006
Nonstatistical argument 2: Median splits are useful for the
expression of categorical latent constructs
IPKSP consider the argument that median splits are
appropriate when the underlying X is theoretically categorical.
In fact, there are numerous constructs that, while being
measured on continuous rating scales, are conceptually more
discrete, viz dichotomous (MacCallum et al., 2002). For
example, locus of control (Srinivasan & Tikoo, 1992) is
usually discussed with an emphasis on internal versus exter-
nal, people are said to be low or high on self-monitoring
(Becherer & Richard, 1978), and people are said to be low or
high in their need for closure(Silvera, Kardes, Harvey,
Cronley, & Houghton, 2005). Such personality typologies
abound: introversion and extraversion, gender identity, type
A and B personalities, liberal and conservative, and so forth.
When researchers think in terms of groups, or study par-
ticipants having relatively more or less of a characteristic, it is
natural that they would seek an analytical method that is
isomorphic, so the data treatment may optimally match the
construct conceptualization.(p. 3)
We have two responses to this argument. First, the examples
offered are older papers that treat a continuous variable cate-
gorically when the overwhelming majority of researchers sub-
sequently using the same scales consider these same constructs to
be continuous and not categorical (cf. Czellar, 2006; Disatnik and
Steinhart (2015), Hoffman, Novak & Schlosser, 2003;Judge &
Bono, 2001; Kardes, Fennis, Hirt, Tormala, & Bullington, 2007;
Shah, Kruglanski, & Thompson, 1998; Webster & Kruglanski,
1994). We believe that when an author uses language such as
high and lowon some continuous construct, this terminology is
often a linguistic convenience rather than a claim that the construct
is categorical with categories of equal frequencies. For example,
readers should ask themselves if they really believe that authors
describing liberalsand conservativesintend to imply only two
levels of liberalismconservatism.
Second, as noted above, if one believes that the theoretical
XY relation is a categorical step function a) the threshold is
unlikely to be at a sample-dependent value like the median, and
b) a step function is a substantive theoretical claim that cannot
simply be assumed; it should be tested using the statistical
methods just mentioned. Median splits do not allow testing of
thresholds and in fact make it impossible to see any sort of
nonlinear relationship involving the variables that might be
theoretically characterized as thresholds. If the continuous data
are split into two categories then all that can be tested is the
difference between those two categories, i.e., a line. There
would be no way to tell from split data whether the original data
had a linear or nonlinear relationship with Y. Leaving the data
continuous allows for testing of whatever nonlinear relationship
the researcher would like to test.
Nonstatistical argument 3: Median splits and ANOVA are
easier to conduct and understand than regression/ANCOVA
IPKSP note that some say that dichotomization of a
continuous X makes the analysis easier to conduct and
interpret. Regression, ANOVA, and ANCOVA (the combina-
tion of regression and ANOVA) are of course identical at their
core and are simply different-appearing instantiations of a
general linear model. ANOVA may seem easier to conduct for
people trained long ago because before computer packages,
ANOVA was easier for students to compute by hand and with a
calculator. Given this constraint, it made some sense that
median splits were utilized to help researchers turn their
regression data into ANOVA-friendly data (Cohen, Cohen,
West and Aiken, 2003). This reason for median splits is no
longer a good argument for splitting data. Graduate training in
regression and ANCOVA has become ubiquitous. We have
collectively trained literally hundreds of PhD students over the
years in these techniques, including many who have gone on to
publish in JCP and other top outlets for consumer research.
IPKSP go on to suggest that ANOVA is easier than regres-
sion to use, because regression results are, more difficult to
interpret because there are no post hoc tests specifying which
values of the independent variable are significantly different
from each other.We strongly disagree. If researchers want to
test at particular focal values of X for significance then they can
use spotlights (Fitzsimons, 2008; Irwin & McClelland, 2001;
Spiller et al., 2013). If there are no focal values, it can be useful
to report a floodlightanalysis, reporting regions of the con-
tinuous X variable where the simple effect of manipulated Z is
significant (Spiller et al., 2013). Johnson and Neyman (1936)
originally proposed these tests, but these tests did not catch on
when statistical computations were by hand. Andrew Hayes'
ubiquitous PROCESS software now makes it trivial to find and
report these regions as a follow-up to finding significant inter-
actions between manipulated Z and continuously measured X.
Further, the argument that median split analyses are easier to
conduct, report, and read breaks down once one acknowledges
what the researchers must report to convince the reader that
the analyses are not misleading. As we describe later, the
researcher wishing to justify a median split of that covariate
must compute correlations between the variable to be split and
each factor and interaction. For a simple two-by-two ANOVA
design with a single covariate, the researcher would need to
compute three correlations, one with each factor and one with
the interaction. As we will show later in this paper, what
matters is the magnitude of these correlations, not their sta-
tistical significance as suggested by IPKSP. Our Fig. 3 later in
this paper shows that simply by chance, some of these corre-
lations would likely be large enough to cause serious bias. The
researcher would have to prove to the reader that substantive
and statistical conclusions do not differ between the median
split analysis and the fully reported corresponding ANCOVA
(cf. Simmons, Nelson, & Simonsohn, 2011). To us, this seems
more difficult for researchers and readers alike than simply
reporting the correct analysis using the continuously measured
independent variables.
Nonstatistical argument 4: Median splits are more
parsimonious
IPKSP argue that it is more parsimoniousto use a
2-category indicator of one's latent X defined by a median split
4G.H. McClelland et al. / Journal of Consumer Psychology xx, x (2015) xxxxxx
Please citethis article as: McClelland, G.H., et al., Median splits, Type II errors, and falsepositive consumer psychology: Don't ght the power, Journal of C onsumer
Psychology (2015), http://dx.doi.org/10.1016/j.jcps.2015.05.006
than to use the original continuous measure of X. The standard
definition of parsimonious is that nothing unnecessary is added,
that the simplest model or analysis technique is preferred.
Philosophers from Aristotle to Galileo to Newton have agreed.
Galileo remarked, Nature does not multiply things unneces-
sarily; that she makes use of the easiest and simplest means for
producing her effects; that she does nothing in vain, and the
like(Galileo, 1962, p. 397).
Adding an extra unnecessary step (requiring calculation of a
statistic, the median) to take the data away from its original
form is not more parsimonious. This use of the concept of
parsimony is not in line with the usual scientific use of the
word.
Statistical considerations in the use of median splits: Effects
on Type II and Type I errors
Beyond their nonstatistical arguments, IPKSP make statis-
tical and quasi-statistical arguments about the consequences
of median splits. We articulate these arguments and present
counter-arguments below.
IPKSP statistical argument 1: Median splits are conservative
Type II errors and conservatism
Much of the IPKSP paper rests on the argument that the use
of median splits is conservative. As noted above, a primary
problem with median splits is that they add error, and thus on
average median splits reduce power. There is no way around
this fact, statistically, and lowering power with no compensat-
ing benefit would be considered to be a bad thing by most
researchers and all statisticians we know. IPKSP rebrand the
reduction of power as a benefit, labeling it, conservative.
Conservatism, in a statistical sense, simply means increasing
the chance of Type II errors and decreasing the chance of Type I
errors. Decreasing your alpha requirements to declare some-
thing significant (say, from .05 to .01) would make a test more
conservative, with the cost in increased Type II errors having
some offsetting benefit in fewer Type I errors. Splitting data is
not conservative in the same way: it increases the chance of both
types of errors because sometimes split data are significant when
the continuous data would not be. If researchers pick the method
that yields significance, then Type I errors will increase even as
splitting, overall, reduces power.
Median splits and falsepositive consumer psychology
The fact that a given sample of data might have a significant
relationship between Y and X for X split at the median and not
for continuously measured X implies that there is a significant
risk of falsepositiveconsumer psychology when authors are
licensed to analyze their data either way and report whichever
comes out to be more significant. In an influential article,
Simmons et al. (2011) noted how undisclosed flexibility in data
collection and analysis allows presenting anything as signifi-
cant.They focused on p-hackingby topping up subjects in a
study until statistical significance is reached or collecting mul-
tiple covariates and adding them to the analysis of an experiment
in different combinations. Gelman and Loken (2014) argue
that this is producing a statistical crisis in science”—when
researchers' hypotheses can be tested in multiple ways from the
same data set and what is reported is what works out as most
supportive of their theorizing. We simulated the effects for
10,000 samples of N = 50 from a bivariate distribution with true
correlation of 0, tested at α= .05 for the continuous XY
correlation and then at α= .05 for the correlation between Y
and median-split X. If one picks and chooses, in 8% of all
samples one or the other or both tests will significant.
As Gelman and Loken (2014) noted, it is not just unscru-
pulous researchers who fall into this trap. Well-meaning
researchers see multiple alternatives as reasonable and decide
a posteriori which seems most reasonable with more
thinking about alternatives when things don't work out. We
are concerned that IPKSP risk giving researchers cover for
more undisclosed flexibility in data analysis. This allowance
just goes into the researcher analysis degrees of freedom
issue that fueled the Simmons et al. (2011) falsepositive
psychologypaper and the associated recent heightened con-
cern about findings in the social sciences that do not replicate.
Bayes theorem and effects of low power
Bayes theorem is the normatively appropriate model for
updating beliefs on the basis of evidence observed from sample
data. Bayes theorem shows that less belief shift is warranted
from a statistically significant finding the lower the power of
the study. IPKSP (p. 2) note that:
In his book, Statistics as Principled Argument,Abelson
(1995) repeatedly made the point that there are many
misconceptions about statistics, and we might argue that
misconceptions about median splits should be added to
Abelson's list.
We do not believe that Abelson would agree if he were still
alive. Abelson (1995) and Brinberg, Lynch, and Sawyer (1992)
both rely on Bayes theorem to make the point that reducing
power implies reducing the belief shift that is warranted from
observing a statistically significant result. Consider hypothesis
H that there is a relationship of a certain magnitude (say r =
.25) between X and Y in the population and the null hypothesis
Hthat there is no association between X and Y. The expected
prior odds of the relative likelihood of H and H = P(H) /
P(H ). Then one observes Datum D, a statistically significant
empirical association between X and Y. The likelihood of
observing D under hypothesis H is the statistical power of the
test, P(D|H). The likelihood of observing D under H , the null
hypothesis, is one's Type I error rate alpha (P(D|H ). Bayes
theorem says that the updated posterior odds ratio of H and H
is now the prior odds ratio times the relative likelihood of
observing datum D given H versus H . Specifically:
PHjDðÞ
PHjDðÞ
¼PHðÞ
PHðÞ
PDjHðÞ
PDjHðÞ ð1Þ
Eq. (1) says that the greater the power relative to Type I error
rates, the greater the belief shift.
5G.H. McClelland et al. / Journal of Consumer Psychology xx, x (2015) xxxxxx
Please citethis article as: McClelland, G.H., et al., Median splits, Type II errors, and falsepositive consumer psychology: Don't ght the power, Journal of Consumer
Psychology (2015), http://dx.doi.org/10.1016/j.jcps.2015.05.006
Abelson (1995) articulates his five MAGICcriteria for
persuading readers based on study findings: Magnitude,
Articulation, Generality, Interestingness, and Credibility. The
first of these is magnitude.Chapter 3 of his book is devoted
to the argument that results are more persuasive if they reflect
bigger effect sizes. Thus, reducing expected effect size by use
of median splits is not a decision to be conservativeand
persuasive. It is a decision to be less persuasive.
We propose that the rhetorical impact of a research result is
a direct function of the raw effect size divided by the cause
size,a ratio that we call causal efficacy. A large effect from a
small variation in the cause is the most impressive, whereas a
small effect arising from an apparently large causal ma-
nipulation is the most anticlimactic and disappointing.
[Abelson (1995), p. 48]
Conservativestudies do not make for a conservative science
Thus far we have focused on how the researcher's choice to
use a median split degrades the persuasiveness of his or her
article. That's problematic, but arguably the author's own
choice. But consider the aggregate implications of these
arguments from the perspective of a journal editor. From a
pool of statistically significant results (all observing D rather
than D ), some subset is made up of true effects and the
complementary subset is made up of Type I errors. The
proportion of published significant results that are Type I errors
is directly determined by the ratio P(D|H)/P(D|H )=(1β)/
α, power divided by Type I error rate.
Ioannidis (2005) has pointed out that even in the absence of
p-hacking or any bias in reporting, the probability that a
statistically significant research finding is true is an increasing
function of power, and therefore, of effect size. Assume a world
where of all hypotheses researchers investigate, half are true
and half are actually null effects in the population. Further,
assume that papers are not published unless study findings are
significant at α= .05. Imagine two versions of that world, one
where power to detect real effects is .80 and another where it is
.40. When power is .80 and α= .05, the ratio of likelihood of
finding significant results when the null is false to when it is
true is 16 to 1 for every 17 significant results reported, 16
are real. When power is .4 and at α= .05, the ratio of like-
lihood of finding significant results when the null is false to
when it is true is 8 to 1 for every 9 significant results
reported, 8 are real. Editors who countenance median splits are
making a choice to publish proportionately more Type I errors
in expectation relative to the number of results that reflect a true
effect in the population.
IPKSP statistical argument #2: The loss of power from median
splits is minimal and easily offset
When there is a linear relationship between Y and latent X,
the effect size (i.e. the r-squared value for the model) when
correlating Y with Xvia median splits is, for normally
distributed data, around .64 of the value when correlating Y
with continuously measured X. That is, the split data have 64%
(2/π) of the effect size that the original data had before
dichotomization. Irwin and McClelland (2003) show that the
damaging reduction in power persists even when the indepen-
dent variable is not normally distributed. Rather than reporting
the r-squared, IPKSP instead focus on the fact that the split
coefficient is 80% of the original coefficient, perhaps causing a
casual reader to underestimate the loss due to dichotomization.
One of the most disturbing aspects of IPKSP is the sug-
gestion that losing power is fine, because researchers can
simply increase sample size to make up for the loss. An
estimate for normally distributed covariates is that sample size
would need to be increased by π/2 = 1.57. Increasing sample
size by 57% to accommodate for a median split is both costly
and potentially unethical. IPKSP, making an argument that
median splits are acceptable, approvingly cite two studies from
the medical literature that used median splits in their analyses
(Kastrati et al., 2011; Lemanske et al., 2010). We believe that
these studies do not support IPKSP's point; rather, these studies
illustrate why just adding more participantsis an unwise
solution to the power loss caused by median splits. These were
medical experiments on actual patients, with true risks. Some
participants in Kastrati et al. died; some children in Lamenske
et al. were hospitalized with serious conditions. We believe it
would be unconscionable to increase in sample size so that the
researchers could use median splits because regression is
somehow less convenient for them. Sadly, none of the split
variables in those two studies had statistically significant
relationships.
Admittedly, the stakes in consumer psychology experiments
are typically not that extreme. However, in our field as well,
there are ethical issues involved with routinely using 57% more
participants than necessary. Requiring people to run more
participants in their studies simply to avoid using multiple
regression instead of ANOVA wastes the time of volunteers in
course-related subject pools (cf. Smith, 1998), wastes money
when paying participants, and potentially depletes the common
resource of willingness to participate in surveys. In any case,
researchers owe it to the participants who have graciously
provided data to analyze those data using the best and most
powerful statistical methods. Losing power is bad, and de-
liberately losing power via median splits is neither effective nor
efficient use of research resources.
Simulations
We have examined IPKSP's simulations and compared
them to the code shown in the Appendix A from their paper.
Our examination revealed serious problems with the simula-
tions. In some instances the code does not match its description
in the paper and in other instances the aggregated reporting of
the results substantially underestimates the deleterious effects
of median splits. We present the highlights of our analysis of
the simulations here and provide extensive details, including
revisions and extensions of the figures in an online technical
appendix. We consider the following important issues in their
simulation results.
6G.H. McClelland et al. / Journal of Consumer Psychology xx, x (2015) xxxxxx
Please citethis article as: McClelland, G.H., et al., Median splits, Type II errors, and falsepositive consumer psychology: Don't ght the power, Journal of C onsumer
Psychology (2015), http://dx.doi.org/10.1016/j.jcps.2015.05.006
Interactions
A major problem with IPKSP's claim of support for
splitting a single continuous covariate in ANCOVA designs is
that no simulation results are presented for the effect of such
splitting on interactions. IPKSP's Fig. 3 purports to show the
effects on the interaction term but their sampling from a
multivariate normal distribution completely precludes any
possibility of generating nonnull interactions. Aiken and
West (1991, p. 181) prove that if Y,X
1
,andX
2
have a
multivariate normal distribution, then the coefficient for the
interaction term must be zero. They summarize (emphasis in
the original):
This result seems surprising. It says that when two predictors
X and Z and a criterion Y are multivariate normal, the
covariance between the product XZ and Y will be zero. Does
this mean that there is necessarily no interaction if X, Z, and
Y are multivariate normal? Yes. Turning the logic around, if
there exists an interaction between X and Z in the prediction
of Y, then, necessarily, the joint distribution of X, Z, and Y is
not multivariate normal.
How then do IPKSP provide their Fig. 3, which purports to
be the effect on the standardized partial regression coefficient
for the interaction term when one variable is split as a function
of the correlation between the independent variables? IPKSP
(p. 4) state for Study 1: A multiple regression model was used
to analyze the three variables, and the estimates for β
1
,β
2
, and
β
3
(for the interaction term) were obtained.
However, an examination of the SAS code provided by
IPKSP reveals they only estimated and recorded the additive
effect coefficients for the continuous and median split analysis
along with some of the p-values. They neither recorded nor
aggregated results for the interaction coefficient. Had they done
so, they would have found that, inconsistent with their Fig. 3,
the mean coefficient for the interaction whether in the
continuous or median split analysis was zero.
A reader of IPKSP might believe that their Fig. 3 came from
the same simulation runs as their Figs. 1 and 2. However, the
code in IPKSP's Appendix A reveals that instead of computing
the interaction as the product of X
1
and X
2
in their original
simulations, IPKSP created an additional simulation in which
they sampled a third variable X
3
fromamultivariatenormal
distribution.
1
For the continuous estimates (upper curve in
their Fig. 3), this third variable X
3
is simply labeled as an
interaction
2
although the mean correlation between this
interactionand the product of X
1
and X
2
is 0 when it should
be 1. That is, rather than analyzing Yas a function of X
1
,X
2
,
and X
1
X
2
, IPKSP analyzed Yas a function of X
1
,X
2
,andX
3
.
For the split estimates (the lower curve in their Fig. 3), SplitX
1
was calculated by splitting continuous X
1
into a {0, 1} variable.
Rather than analyzing Yas a function of SplitX
1
,X
2
,and
SplitX
1
X
2
, IPKSP analyzed Yas a function of SplitX
1
,X
2
,
and SplitX
1
X
3
.
3
Coefficient estimates for this last term
neither represent the true interaction nor can they be
meaningfully compared to the continuous estimates. Thus,
IPKSP's Fig. 3 does not depict results about interaction terms
and should be ignored entirely.
The simulations underlying IPKSP's Fig. 4 explicitly built
in a null effect for the interaction. Hence, IPKSP present no
information about the effects of median splits on the estimate of
true interactions. If they had simulated an actual interaction,
what would they have found? Busemeyer and Jones (1983)
showed that interactions are very fragile due to measurement
error in the independent variables and monotonic transforma-
tions. Median splits introduce unnecessary measurement error
and are a heavy-handed monotonic transformation. McClelland
and Judd (1993) show that even without those problems there
is precious little statistical power for detecting interactions
involving continuous variables, especially ones having normal
distributions. Mani et al. (2013a) provide an empirical example
of surprising effects on interactions caused by median splits.
IPKSP favorably cite Farewell, Tom, and Royston's (2004)
analysis of prostate cancer data, but even they warn, this
example illustrates the potential pitfalls of trying to establish
interactions of treatment with a continuous variable by using
cutpoint analysis.IPKSP present no simulations or other
information to alleviate concerns about the many dire warnings
against using median splits when interactions might be involved.
Simulations versus derivations
The simulation results in IPSKP's Figs. 1 and 2 are
unnecessary because they are easily derivable. This observation
is not a criticism in itself. Instead, we use the derivations both
to extend their results to parameter values that IPKSP did
not consider and to provide a more detailed examination of
the effects of splitting a continuous variable. Cohen et al.
(2003, p. 68) present the basic formulas for computing
standardized partial regression coefficients from correlations:
βY1:2¼rY1rY2r12
1r2
12
and βY2:1¼rY2rY1r12
1r2
12
:ð2Þ
It is well known that performing a median split of a
normally-distributed variable reduces its squared correlation
with other variables to 2π0:64 of what it would have
originally been without splitting. That is, a model predicting Y
from the median split of X
1
suffers a loss of explanatory power
of 36% compared to a model predicting Y from continuous X.
Adding the factor ffiffiffiffiffiffi
2=π
pto r
Y1
and r
12
in the above equations
provides expected values for standardized partial regression
coefficient when normally-distributed X
1
but not X
2
is split at
its median.
1
This can be seen in IPKSP's Appendix A as call vnormal(x,,sigma,nn,);in
conjunction with the rst three sets of modications.
2
This can be seen in IPKSP's Appendix A as interact = x[,4];.
3
This can be seen in IPKSP's Appendix A as interact = x[,4]#x[,5];
regx = interc||x[,5]||x[,3]||interact;, where x[,5] represents SplitX
1
, x[,3]
represents X
2
, x[,4] represents X
3
, and interact is recalculated in the rst
statement to be X
3
SplitX
1
.
7G.H. McClelland et al. / Journal of Consumer Psychology xx, x (2015) xxxxxx
Please citethis article as: McClelland, G.H., et al., Median splits, Type II errors, and falsepositive consumer psychology: Don't ght the power, Journal of Consumer
Psychology (2015), http://dx.doi.org/10.1016/j.jcps.2015.05.006
Unrepresentative sampling
Having the exact formulas for the expected values allows a
re-examination of the research situations generated by IPKSP's
sampling strategy. IPKSP sampled from a multivariate normal
distribution varying r
Y1
,r
Y2
, and r
12
each over the set of values
{0, 0.1, 0.3, 0.5, and 0.7} for a total of 125 conditions. We are
ignoring the sampling over n, as did they, because the formulas
provide expected values independent of n. Although the
factorial sampling of the correlation values used by IPKSP
may seem appealing, it has the disadvantage of creating many
unusual, unrepresentative combinations. For example, one of
their sampled conditions is r
Y1
= 0.7, r
Y2
=0, r
12
= 0.7.
Using the above formulas the expected estimates in this case
are β
1
= 1.37 (greater than 1.0 because of the collinearity)
and β
2
=0.96 (strongly negative despite the zero-order
correlation being zero). Although possible, this combination of
parameter values is rare and not representative of typical
research. Why are we interested in the effect of median splits in
such atypical conditions? In 54 (42%) of the 125 sampled
conditions, one or the other standardized partial regression
coefficients are negative, despite having non-negative zero-
order correlations. More importantly, as we shall illustrate later,
IPKSP's averaging across these positive and negative values
makes the effects of splitting appear more benign than the
reality revealed in the disaggregated results.
Estimates of β
1
in Study 1
IPSKP's Fig. 1 displays estimates of the standardized partial
regression coefficient for X
1
with and without the median split.
Despite having performed their simulations over a set of five
values for the independent variable intercorrelation (0, 0.1, 0.3,
0.5, and 0.7), they report only three (0, 0.3, and 0.5). Our
derivations in the technical appendix show that the ratio of the
split estimate to the original estimate as well as the increment
in squared correlation depend only on the correlations of the
independent variables with each other and not with the
dependent variable. The graphs of these relationships in our
Fig. 1 show an increasingly rapid loss in parameter size and
explained variation as the intercorrelation increases. Even with
no correlation, splitting produces a sharp loss of 20% in the size
of the parameter estimate and an even larger loss of 36% of
increment of the squared correlation to about 64% of what it
would have been, as represented by the dashed line. These
initial penalties are stiff and these penalties rapidly increase for
both the parameter estimate and the increment in squared
correlation as the intercorrelation among independent variables
increases.
As an example, consider the plausible case where r
Y1
=
0.3, r
Y2
= 0.3, r
12
= 0.3. The parameter estimate for the
continuous data is 0.23 and is reduced to 0.178 by splitting. The
very small increment in squared correlation of 0.048 for the
continuous analysis loses 38.5% of its value to 0.03 by
splitting. Few researchers in the social sciences can afford a
minimum loss of 36% of their already small increments in
explained variation. It is useful to quantify just how sizable
these reductions are. When r
12
= 0.3, splitting an independent
variable and using α= 0.05 for significance testing is
equivalent (in expectation) to doing the analysis of the
continuous variable but using α= 0.01. This is substantial
and unnecessary loss of power, which becomes rapidly worse
as the intercorrelation between independent variables increases
beyond 0.3.
Estimates of β
2
in Study 1
IPKSP's Fig. 2 displays the standardized partial regression
coefficient for X
2
with and without the median split of X
1
. The
aggregation across disparate conditions in their Fig. 2 presents
an unrealistically benign view of the effects of splitting one
variable on the estimate of the other variable. IPKSP's Fig. 2
shows in the aggregate what the authors refer to as a slight
liftingof the estimate of β
2
. However, we prove in the
technical appendix that whenever
r12 brY1
rY2
;ð3Þ
splitting the first predictor increases the estimate of the
coefficient for the second predictor compared with what it
would have been without the median split. Conversely, the
opposite inequality implies that splitting the first predictor
decreases the estimate of the coefficient for the second
predictor. IPKSP's sampling scheme included more of the
former than the latter so the weighted average displayed in their
Fig. 2 shows a slight increase. Disaggregating results according
to the inequality reveals major distortions in the estimate of β
2
as the predictor correlation increases.
Consider the special but realistic case for which the two
zero-order correlations are approximately equal; that is, r
Y1
=r
Y2
.
Then the ratio of the two correlations is 1.0 and the inter-
correlation r
12
is necessarily less than 1.0 so the effect of splitting
the first predictor will be to enhance the estimate of the second
predictor, with the enhancement increasing as the correlation
increases. Simultaneously, increasing predictor intercorrelation
Fig. 1. Ratio of split to continuous results for parameter estimate of β
1
(top
curve) and its increment in squared correlation as a function of the predictor
intercorrelation. Dashed line represents no splitting of the continuous variable.
8G.H. McClelland et al. / Journal of Consumer Psychology xx, x (2015) xxxxxx
Please citethis article as: McClelland, G.H., et al., Median splits, Type II errors, and falsepositive consumer psychology: Don't ght the power, Journal of C onsumer
Psychology (2015), http://dx.doi.org/10.1016/j.jcps.2015.05.006
implies decreasing estimates of the first predictor's coefficient.
The combined effect is dramatic, as illustrated in the graph of the
ratio β2β1in our Fig. 2. The ratio of the true values is 1 but at a
minimum when the predictors are independent, the ratio when the
first predictor is split is 1.25. For correlations of 0.3, 0.5, and 0.7,
the ratio increases to 1.45, 1.71, and 2.32, respectively. Thus,
splitting one variable when predictor correlations are equal would
lead a researcher to misjudge the relative magnitude of the two
predictors.
In the case where the predictor correlations with the depen-
dent variable are unequal, i.e., r
Y1
r
Y2
, we prove in the
technical appendix that the estimate for the second coefficient
when the other is split becomes a weighted average of the
coefficients from the continuous analysis. That is:
β
Y2:1¼wβY1:2þ1wðÞβY2:1:ð4Þ
The exact form of wis not as important as recognizing that
the estimate of β
Y2.1
when splitting X
1
is always a confounding
of the original two coefficients for the continuous analysis, and
the confounding works in the same way as poor experimental
design. As an example consider the plausible case r
Y1
=
0.5, r
Y2
= 0.3, r
12
=0.3. Then w= 0.18 and β
Y2.1
=
wβ
Y1.2
+(1w)β
Y2.1
= 0.18(0.45) + (1 0.18)(0.165) = .22.
In other words, splitting X
1
increases the estimate of the
coefficient for X
2
from 0.165 to 0.22 by averaging in 18% of
the coefficient for X
1
. This is substantial and unnecessary
confounding that should not be acceptable in scientific
publications.
Study 2
IPSKP's Fig. 4 reports simulation results for a very narrow
set of conditions: variable A is continuous and has an effect of
varying magnitude whereas variable B, a two-level factor, and
the interaction A x B have null effects. IPKSP's simulations
reveal negligible effects on the average p-values for B and A x
B when the continuous variable A is split at its median.
However, there are problems with the simulations and how they
are presented in their Fig. 4.
First, statistical power is of more interest than average
p-values for evaluating analysis procedures. If effect sizes are
small there is little hope of finding significant results whether or
not a variable is split, and if effect sizes are large one can find
significant results even if the data are abused by splitting. Of
greater interest is the power for typical effect and sample sizes.
As we have done above, consider an effect size
4
of r
YA
= 0.3
and a sample size of n= 100. We used a simulation in R
equivalent to the SAS code provided by IPKSP to find mean
p-values of 0.032 for the continuous analysis of A and 0.082
when A is split at its median.
5
More importantly, the power for
the continuous analysis equals 0.86, greater than the minimum
power of 0.80 recommended by Cohen, whereas the power
when A is split is only 0.68, below Cohen's recommendation.
Given a real effect, we expect the continuous analysis to
(correctly) produce a significant effect about 20% of the time
more frequently than the split analysis. Thus, it is at the
moderate effect and sample sizes most likely to be encountered
in consumer psychological research that research is most likely
to be adversely affected by the conservatism of median splits.
Second, and more importantly, the context underlying their
Fig. 4 sets a low bar (i.e., whether there are any changes in
p-values for null effects) in an unrepresentative research
context. It would be quite unusual in an ANCOVA design
with a single continuous covariate A, for the two-level factor B
to have no anticipated effect on the outcome variable Y. Note
also that the prior results showing distortions and major
decreases in the parameter estimate and effect sizes when
predictors are correlated precludes the use of two continuous
covariates because they are likely to be correlated to some
degree. IPKSP's analysis showed that splitting a continuous
variable A at its median could have substantial effects on the
other parameter estimate for the other variable, in this case B,
when A and B were correlated. Nothing in that analysis required
that B be continuous so any correlation between A and B risks
distorting the analysis of B and the A x B interaction, likely
artificially enhancing them.
Even though with random assignment to conditions a
researcher would expect a zero population correlation between
A and B, problems might still arise because it is not the
population correlation but the actual correlation in the sample of
data that determines whether there will be distortion in the
estimates of B. It is irrelevant whether this correlation in the
sample is statistically significant because any sample correlation
will distort the parameter estimates. Our Fig. 3 displays the
sampling distributions for the correlation coefficient for sample
sizes of 50 (shallowest), 100, and 200 (steepest) when the
population correlation is zero. Clearly, there is ample
4
We were unable to reproduce IPKSP's translation of correlations to the
mean differences in standard deviations used as indices for the graphs in their
Fig. 4. Instead of using those mean differences, we report our results in terms of
the correlations used to generate variable A.
5
This value is lower than reported in IPSKP's Fig. 4, which appears to be
about 0.11. Running their SAS code also produced a value equal to about 0.08
so it appears the value in Fig. 4 is a transcription error.
Fig. 2. Ratio of estimated coefficients when splitting X
1
at its median (solid line)
versus leaving X
1
continuous when r
Y1
=r
Y2
0.
9G.H. McClelland et al. / Journal of Consumer Psychology xx, x (2015) xxxxxx
Please citethis article as: McClelland, G.H., et al., Median splits, Type II errors, and falsepositive consumer psychology: Don't ght the power, Journal of Consumer
Psychology (2015), http://dx.doi.org/10.1016/j.jcps.2015.05.006
opportunity for there to be a sample correlation that would cause
problematic distortions in the estimates of the ANOVA factors
when a continuous covariate is split at its median. Because, as
shown above, the bias in the other estimates is a weighted
average of the estimates from a continuous analysis, the danger
is greater the larger the effect size of A. Hence, a recipe for an
unscrupulous researcher wanting to enhance the effect size of
factor B is to find a strong covariate A and split it at its median.
Summary
Contrary to the arguments of IPKSP, there is no compelling
reason to split continuous data at their median. Splitting a
continuous variable at its median introduces random error to the
independent variable by creating a sample-dependent step
function relating Y to latent X.
IPKSP argue that in some contexts it is appropriate to use
median splits because they are popular. We describe the
harrowing negative consequences of relying on their popularity.
IPKSP argue that analyses using median splits are easier to
conduct, report, and understand than analyses using the original
metric. We describe how a full accounting of the necessary
conditions to safely use a median split is more onerous (and
includes conducting and reporting the continuous analysis).
IPKSP argue that median splits are useful to test categorical
latent constructs. Yet their own examples are continua, not
categories, and if there were a substantive claim of such
theory-derived thresholds, the functional form of such analyses
would require testing.
Most critically, IPKSP argue that median splits are not
problematic because they are conservative,that is, because
they merely make it more difficult to detect a true effect. Yet
authors who choose to reduce statistical power by using median
splits reduce the persuasive impact of their own findings
(Abelson, 1995). Further, by licensing authors to use median
splits or continuous analyses at their discretion, IPKSP open the
door for an additional researcher degree of freedom and
cherry-picking of the more favorable result (Gelman & Loken,
2014; Simmons et al., 2011). It is easy to show analytically that
even without such cherry-picking, the lower the power of
statistically significant findings in the literature base, the higher
the proportion of results in the literature that will be false or
overstated (Ioannidis, 2005). Cavalierly lowering power by
median splits creates a less reliable literature base.
Finally, the publication of IPKSP's article depends on their
simulations. These simulations are flawed. The text of IPKSP
describes the simulations as bearing on models with interac-
tions, but coding errors in the simulation of the interaction of a
categorical and a continuous independent variable indicate that
readers are learning about the effects of median splits in a
model with three additive independent variables. IPKSP did not
simulate the interaction. Further, the simulations needlessly
aggregate across different situations with different effects. By
pooling across multiple simulation conditions, IPKSP combine
cases where the effect is underestimated with those where it is
overestimated, leading to a misleading overall result of not too
bad.This error can be shown analytically.
According to IPSKP, their main contribution is giving the
green light to researchers who wish to conduct a median split
on one of their continuous measures to be used as one of the
factors in an orthogonal experimental design, such as a
factorial, and then use ANOVA to model and communicate
results(p11). We see no such green light, and many red flags.
IPKSP state unequivocally that, there is no material risk to
science posed by median splits pertaining to Type II errors
(p4). We hope that we have made clear in this commentary why
we could not disagree more with this conclusion.
Appendix A. Mathematical Derivations
Mathematical derivations for this article can be found online
at http://dx.doi.org/10.1016/j.jcps.2015.05.006.
References
Abelson, R. P. (1995). Statistics as principled argument. New York: Psychology
Press.
Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and interpreting
interactions. Newbury Park, CA: Sage Publications.
Akerlof, G. A. (1970). The market for lemons: Quality uncertainty and the
market mechanism. The Quarterly Journal of Economics,84(3), 488500.
Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous
retroactive influences on cognition and affect. Journal of Personality and
Social Psychology,100, 407425. http://dx.doi.org/10.1037/a0021524.
Brauer, M., & McClelland, G. H. (2005). L'utilisation des contrastes dans
l'analyse des données: Comment tester les hypothèses spécifiques dans la
recherche en psychologie? L'Année Psychologique,105(2), 273305.
Brinberg, D., Lynch, J. G., Jr., & Sawyer, A. G. (1992). Hypothesized and
confounded explanations in theory tests: A Bayesian analysis. Journal of
Consumer Research,19(2), 139154.
Busemeyer, J. R., & Jones, L. E. (1983). Analysis of multiplicative combination
rules when the causal variables are measured with error. Psychological
Bulletin,93(3), 549562.
Butts, M. M., & Ng, T. W. (2009). Chopped liver? OK. Chopped data? Not OK.
In C. E. Lance, & R. J. Vandenberg (Eds.), Statistical and methodological
myths and urban legends: Doctrine, verity and fable in the organizational
and social sciences (pp. 361386). New York: Taylor & Francis.
Cohen, J. (1983). The cost of dichotomization. Applied Psychological
Measurement,7(3), 249253.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple
regression/correlation analysis for the behavioral sciences (3rd ed.).
Mahwah, New Jersey: Lawrence Erlbaum Associates.
Fig. 3. Sampling distributions for correlation coefficient rfor sample sizes of 50
(shallowest), 100, and 200 (steepest).
10 G.H. McClelland et al. / Journal of Consumer Psychology xx, x (2015) xxxxxx
Please citethis article as: McClelland, G.H., et al., Median splits, Type II errors, and falsepositive consumer psychology: Don't ght the power, Journal of C onsumer
Psychology (2015), http://dx.doi.org/10.1016/j.jcps.2015.05.006
Czellar, S. (2006). Self-presentational effects in the Implicit Association Test.
Journal of Consumer Psychology,16(1), 92100.
DeCoster, J., Iselin, A. R., & Gallucci, M. (2009). A conceptual and empirical
examination of justifications for dichotomization. Psychological Methods,
14(4), 349366.
Disatnik, D., & Steinhart, Y. (2015). Need for cognitive closure, risk aversion,
uncertainty changes, and their effect on investment decisions. Journal of
Marketing Research,52(June), 349359.
Farewell, V. T., Tom, B. D. M., & Royston, P. (2004). The impact of
dichotomization on the efficiency of testing for an interaction effect in
exponential family models. Journal of the American Statistical Association,
99(467), 822831.
Fitzsimons, G. J. (2008). Editorial: Death to dichotomizing. Journal of
Consumer Research,35(1), 58.
Fleischmann, M., & Pons, S. (1989). Electrochemically induced nuclear fusion
of deuterium. Journal of Electroanalytical Chemistry and Interfacial
Electrochemistry,261(2), 301308.
Frederick, S. (2005). Cognitive reflection and decision making. The Journal of
Economic Perspectives,19(4), 2542.
Galileo, G. (1962). Dialogue Concerning the Two Chief World Systems.
Translated by S. Drake, Foreword by Albert Einstein. Berkeley: University
of California Press.
Gans, J. S., & Shepherd, G. B. (1994). How are the mighty fallen: Rejected classic
articles by leading economists. The Journal of Economic Perspectives,8(1),
165179.
Gelman, A., & Loken, E. (2014). The statistical crisis in science. American
Scientist,102(6), 460.
Hoffman, D. L., Novak, T. P., & Schlosser, A. E. (2003). Locus of control, web
use, and consumer internet regulation. Journal of Public Policy and
Marketing,22(1), 4157.
Humphreys, L. G. (1978). Doing research the hard way: Substituting analysis of
variance for a problem in correlational analysis. Journal of Educational
Psychology,70(6), 873876.
Humphreys, L. G., & Fleishman, A. (1974). Pseudo-orthogonal and other
analysis of variance designs involving individual-difference variables.
Journal of Educational Psychology,66(4), 464472.
Iacobucci, D., Posovac, S. S., Kardes, F. R., Schneider, M. J., & Popovich, D.
L. (2015). Toward a more nuanced understanding of the statistical
properties of a median split. Journal of Consumer Psychology.http://dx.
doi.org/10.1016/j.jcps.2014.12.002 (this issue).
Iacobucci, D., Saldanha, N., & Deng, X. (2007). A meditation on mediation:
Evidence that structural equations models perform better than regressions.
Journal of Consumer Psychology,17(2), 139153.
Ioannidis, J. P. (2005). Why most published research findings are false. PLoS
Medicine,2(8), 696701.
Irwin, J. R., & McClelland, G. H. (2001). Misleading heuristics and moderated
multiple regression models. Journal of Marketing Research,38(1), 100109.
Irwin, J. R., & McClelland, G. H. (2003). Negative consequences of
dichotomizing continuous predictor variables. Journal of Marketing
Research,40(August), 366371.
Johnson, P. O., & Neyman, J. (1936). Tests of certain linear hypotheses and
their application to some educational problems. Statistical Research
Memoirs,1,5793.
Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random
factor in social psychology: A new and comprehensive solution to a
pervasive but largely ignored problem. Journal of Personality and Social
Psychology,103(1), 5469.
Judge, T. A., & Bono, J. E. (2001). Relationship of core self-evaluations
traitsself-esteem, generalized self-efficacy, locus of control, and emotion-
al stabilitywith job satisfaction and job performance: A meta-analysis.
Journal of Applied Psychology,86(1), 80.
Kardes, F. R., Fennis, B. M., Hirt, E. R., Tormala, Z. L., & Bullington, B.
(2007). The role of the need for cognitive closure in the effectiveness of the
disrupt-then-reframe influence technique. Journal of Consumer Research,
34(3), 377385.
Kastrati, A., Neumann, F. -J., Schulz, S., Massberg, S., Byrne, R. A., Ferenc,
M., et al. (2011). Abciximab and heparin versus bivalirudin for non-ST
elevation myocardial infarction. The New England Journal of Medicine,
365(21), 19801989.
Keppel, G., & Wickens, T. D. (2004). Design and analysis: A researcher's
handbook (4th ed.). Berkeley: University of California Press.
Lemanske, R. F., Mauger, D. T., Sorkness, C. A., Jackson, D. J., Boehmer, S. J.,
Martinez, F. D., et al. (2010). Step-up therapy for children with uncontrolled
asthma receiving inhaled corticosteroids. The New England Journal of
Medicine,362(11), 975985.
MacCallum, R. C., Zhang, S., Preacher, K. J., & Rucker, D. D. (2002). On the
practice of dichotomization of quantitative variables. Psychological
Methods,7(1), 1940.
Mani, A., Mullainathan, S., Shafir, E., & Zhao, J. (2013a). Poverty impedes
cognitive function. Science,341(6149), 976980.
Mani, A., Mullainathan, S., Shafir, E., & Zhao, J. (2013b). Response to comment
on Poverty impedes cognitive function.Science,342(6163), 1169-e.
Maxwell, S. E., & Delaney, H. D. (1993). Bivariate median splits and spurious
statistical significance. Psychological Bulletin,113(1), 181190.
McClelland, G. H., & Judd, C. M. (1993). Statistical difficulties of detecting
interactions and moderator effects. Psychological Bulletin,114, 376390.
Shah, J. Y., Kruglanski, A. W., & Thompson, E. P. (1998). Membership has its
(epistemic) rewards: Need for closure effects on in-group bias. Journal of
Personality and Social Psychology,75(2), 383.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive
psychology undisclosed flexibility in data collection and analysis allows
presenting anything as significant. Psychological Science,22(11), 13591366.
Smith, N. C. (1998). Presidential session summary: Ethics in consumer research.
In J. W. Alba, & J. W. Hutchinson (Eds.), Advances in consumer research,
Volume 25. (pp. 68). Provo, UT: Association for Consumer Research.
Spiller, S. A., Fitzsimons, G. J., Lynch, J. G., Jr., & McClelland, G. H. (2013).
Spotlights, floodlights, and the magic number zero: Simple effects tests in
moderated regression. Journal of Marketing Research,50(2), 277288.
Webster, D. M., & Kruglanski, A. W. (1994). Individual differences in need for
cognitive closure. JournalofPersonalityandSocialPsychology,67(6), 1049.
Wicherts, J. M., & Scholten, A. Z. (2013). Comment on poverty impedes
cognitive function.Science,342(6163), 1169-1169.
11G.H. McClelland et al. / Journal of Consumer Psychology xx, x (2015) xxxxxx
Please citethis article as: McClelland, G.H., et al., Median splits, Type II errors, and falsepositive consumer psychology: Don't ght the power, Journal of Consumer
Psychology (2015), http://dx.doi.org/10.1016/j.jcps.2015.05.006
... One oft-cited concern about using a median split to dichotomize a continuously measured variable is that doing so increases the likelihood of a type I error (e.g., McClelland, Lynch, Irwin, Spiller, and Fitzsimons 2015). Importantly, the underlying literature on median splits is careful to conclude that this increase in the likelihood of type I error occurs when two continuous independent measures that are correlated are both dichotomized (Maxwell and Delaney 1993;DeCoster, Iselin, and Gallucci 2009). ...
Article
We conduct an experiment in the securities-based crowdfunding setting to investigate whether some investors avoid accounting information for psychological reasons, even when they understand the information is useful in their decision-making. Results suggest investors who experience relatively more psychological discomfort when working with quantitative information are relatively less likely to acquire the financial statements of a potential crowdfunding investment. Importantly, this effect is incremental to any effect of investors' quantitative ability (i.e., their numeracy) and attenuates with an intervention designed to help investors overcome their psychological discomfort. Altogether, the results extend our understanding of the theory of information avoidance, provide a behavioral explanation for investors' documented underuse of accounting information, and can inform regulators as they revise crowdfunding regulations. JEL Classifications: G11; G41; M41.
... Recent work demonstrated that deriving plausible effect sizes independently from the included studies circumvents this problem and is useful for the comparison of studies (e.g., Brauer et al., 2019). Dichotomization of scores or ratings has been heavily criticized (Irwin & McClelland, 2003;MacCallum et al., 2002;McClelland et al., 2015). Therefore, we scored the use of un-dichotomized, continuous data as preferable. ...
Article
Full-text available
Most studies on physical attractiveness use (static) photos to rate physical attractiveness. This might not reflect how we perceive people in real, dynamic settings. Based on inconsistent previous studies, we conducted a meta-analysis to evaluate the ecological validity of photo-based attractiveness judgements by comparing them to dynamic stimuli ratings. Our literature search resulted in n = 46 effect sizes (k = 14 studies). Although the overall correlation between ratings of static and dynamic stimuli is high (r = 0.70, 95% CI [0.52; 0.81]), heterogeneity between studies is high as well (Q(45) = 168.27, p < 0.0001 and I2 = 77.71%), which is mostly explained by unreported stimulus quality and within- versus between-rater designs. A Monte Carlo simulation indicated that the small correlations in some previous studies are potentially correlations which had not stabilized yet. Our findings support that the photo-rating method is an ecologically valid approach to assess physical attractiveness.
... To account for this skew (and to facilitate the interpretation of planned moderation analyses), we initially planned to conduct a median split to dichotomize the sample based on subclinical depressive symptoms. However, as median splits have received considerable criticism and may lead to loss of information (McClelland et al., 2015), we decided to conduct a conventional linear regression analyses with bias corrected and accelerated (BCa) bootstrapping (5000 samples) to obtain robust significance estimates. For descriptive purposes, we nonetheless present baseline characteristics for high versus low depressive symptoms based on the median split in Table 1. ...
Article
Full-text available
Dysfunctional expectations and interpersonal problems are associated with depression, so we measured expected emotions towards interaction partners and compared them with actual emotions. We hypothesized that, between persons, individuals with higher subclinical depression would display stronger, more stable, and less accurate negative expectations. Within persons, we hypothesized that momentary negative expectations would predict subsequent negative affect. Fifty-three students completed 6 days of Experience Sampling, consisting of one morning expectation-assessment (9 am), three assessments on actual interpersonal emotions (1 pm, 5 pm, 9 pm), and six random affect-assessments. We regressed expected emotions, experienced emotions, expectation fluctuations, and expectation violations on subclinical depression. Using mixed model analyses, we further examined whether negative expectations preceded negative affect, and whether expectation violations preceded adjustments of expectations. Higher subclinical depression predicted more negative expectations. Within persons, worse-than-expected interpersonal interactions preceded negative affect whereas better-than-expected interactions preceded reductions of negative expectations. Despite problems with skewed data, our approach appears well-suited to examine interpersonal expectations in vivo.
... First, assumptions related to age differences were tested; it is expected that younger consumers feel more skeptical of advertising compared to older consumers. For this analysis, participants were grouped into self-identified generations for ease of interpretation of the findings; however, for the hypothesis testing, analysis used the continuous age variable to avoid loss of variance that can happen when consumers are put into groups (such as with median splits, e.g., McClelland et al. 2015). Ad skepticism differences were significantly different (overall test: F(3, 271) ¼ 11.27, p < .001). ...
Article
Full-text available
Despite the growing popularity of influencer advertising among advertisers, little is known regarding influencer advertising effectiveness, particularly in relation to advertising design factors. The current research seeks to address this gap by exploring how influencer presence (whether or not the influencer appears in the ad) and ad source (whether the brand or the influencer posts the ad to social media) interact with consumer age to influence self–brand connections. It suggests that influencer presence can positively influence self–brand connections; however, the effectiveness of influencers is limited for younger consumers. It highlights the importance of building self–brand connections through influencer advertising by showing a positive relationship with brand attitudes and likelihood to spread electronic word-of-mouth.
Article
Introduction: Eating disorders (EDs) have high rates of relapse. However, it is still not clear which factors are the strongest predictors of ED relapse, and the extent to which predictors of relapse may vary due to study and individual differences. Objective: We conducted a meta-analysis to quantify and compare which factors predict relapse in EDs and evaluate various potential moderators of these relations (e.g., ED subtype, sample age, length of follow-up, timing of predictor assessment, relapse operationalization). Methods: A total of 35 papers (effects = 315) were included. We used a multilevel random-effects model to estimate summary study-level effect sizes, and multilevel mixed-effects models to examine moderator effects. Results: Higher level of care, having psychiatric comorbidity, and higher severity of ED psychopathology were associated with higher odds of relapse. Higher leptin, higher meal energy density/variety, higher motivation for change, higher body mass index/weight/body fat, better response to treatment, anorexia nervosa-restricting (vs. anorexia nervosa-binge purge) subtype diagnosis, and older age of ED onset were associated with lower odds of relapse. Several moderators were identified. Discussion: A variety of variables can predict ED relapse. Furthermore, predictors of ED relapse vary among ED subtypes, sample ages, lengths of follow-up, timing of predictor assessments, and relapse operationalization. Future research should identify the mechanisms by which these variables may contribute to relapse.
Article
Full-text available
Background Cognitive symptoms are common during and following episodes of depression. Little is known about the persistence of self-reported and performance-based cognition with depression and functional outcomes. Methods This is a secondary analysis of a prospective naturalistic observational clinical cohort study of individuals with recurrent major depressive disorder (MDD; N = 623). Participants completed app-based self-reported and performance-based cognitive function assessments alongside validated measures of depression, functional disability, and self-esteem every 3 months. Participants were followed-up for a maximum of 2-years. Multilevel hierarchically nested modelling was employed to explore between- and within-participant variation over time to identify whether persistent cognitive difficulties are related to levels of depression and functional impairment during follow-up. Results 508 individuals (81.5%) provided data (mean age: 46.6, s.d.: 15.6; 76.2% female). Increasing persistence of self-reported cognitive difficulty was associated with higher levels of depression and functional impairment throughout the follow-up. In comparison to low persistence of objective cognitive difficulty (<25% of timepoints), those with high persistence (>75% of timepoints) reported significantly higher levels of depression (B = 5.17, s.e. = 2.21, p = 0.019) and functional impairment (B = 4.82, s.e. = 1.79, p = 0.002) over time. Examination of the individual cognitive modules shows that persistently impaired executive function is associated with worse functioning, and poor processing speed is particularly important for worsened depressive symptoms. Conclusions We replicated previous findings of greater persistence of cognitive difficulty with increasing severity of depression and further demonstrate that these cognitive difficulties are associated with pervasive functional disability. Difficulties with cognition may be an indicator and target for further treatment input.
Article
Mediation analysis is used to study the relationship between stimulus and response in the presence of intermediate, generative variables. The traditional approach to the analysis utilizes the results of an aggregate regression model, which assumes that all respondents go through the same data-generating mechanism. We introduce a new approach that is able to uncover the heterogeneity in mediating mechanisms and provides more informative insights from mediation studies. The proposed approach provides individual-specific probabilities to mediate as well as a new measure of the degree of mediation as the prevalence of mediation in the sample. Covariates in the proposed model help describe the variation in the probability to mediate among respondents. The empirical examination of published studies demonstrates the presence of heterogeneity in mediating processes and supports the need for this new approach. We present evidence that the results of our more flexible heterogeneous mediation analysis do not necessarily agree with traditional aggregate measures. We find that the conclusions from the aggregate analysis are neither sufficient nor necessary to claim mediation in the presence of heterogeneity. A web-based application allowing researchers to analyze the data with the proposed model in a user-friendly environment is developed. https://bayesianmediationanalysis.shinyapps.io/BAHM/
Article
Full-text available
Facial first impressions are known to influence how we behave towards others. As a result of the COVID-19 pandemic, we often view incomplete faces due to the commonplace wearing of face masks. Previous research has shown that perceptions of attractiveness are often increased due to these coverings, with initial evidence suggesting that this may be caused by viewers using a mental representation of the average face to complete any missing information. Here, we directly address this hypothesis by presenting participants with incomplete faces (either the lower or upper half removed) and asking them to decide how they thought the actual, full face looked. Participants were able to manipulate the missing half of the face onscreen by increasing or decreasing the averageness of its shape. Our results demonstrated that participants did not select the original versions of the faces but instead chose more average versions when manipulating both the lower and upper face. Further, the typicality of the original image influenced responses, with less typical faces (in comparison with more typical ones) being completed using an even more average version of the missing half of the faces. Taken together, these findings provide the first direct evidence that people utilise an average/typical internal representation when inferring information about incomplete faces. This result has theoretical importance in terms of visual perception, as well as real-world relevance in a time where face masks are commonplace due to the COVID-19 pandemic.
Article
For a large portion of its history, broadcasting has been stagnant when it comes to incorporating new and innovative technologies. However, due to declining viewership and consumer desire for customizable content, augmented reality (AR) graphics have begun to be incorporated into multiple broadcast products. The current study contributes to the literature by providing much needed answers to questions surrounding consumer perceptions and the implementation of AR in broadcasts. Guided by the Elaboration Likelihood Model and using sport broadcasts as the context, three NBA broadcasts were assessed: a mascot-mode with over the top AR graphics, a coach-mode with play-by-play AR graphics, and a traditional broadcast with no graphics. Following random assignment, participants in the current study were more likely to re-view (p < .05) and recommend via word of mouth (p < .05) the coach-mode AR than the mascot-mode AR. As formidable media providers like ESPN do not see AR as a fad, the current findings are noteworthy. Both AR enhancements under investigation represent substantial alterations to the core, traditional product. Thus, media providers are advised to introduce AR components gradually and systematically, so as not to overload the conservative viewer.
Article
Full-text available
The data includes measures collected for the two experiments reported in “False-Positive Psychology” [1] where listening to a randomly assigned song made people feel younger (Study 1) or actually be younger (Study 2). These data are useful because they illustrate inflations of false positive rates due to flexibility in data collection, analysis, and reporting of results. Data are useful for educational purposes.
Article
Three studies examined the impact of the need for cognitive closure on manifestations of in-group bias. All 3 studies found that high (vs. low) need for closure increased in-group favoritism and outgroup derogation. Specifically, Study 1 found a positive relation between need for cognitive closure and both participants' ethnic group identification and their collective self-esteem. Studies 2 and 3 found a positive relation between need for closure and participants' identification with an in-group member and their acceptance of an in-group member's beliefs and attitudes. Studies 2 and 3 also found a negative relation between need for closure and participants' identification with an out-group member and their acceptance of an out-group member's beliefs and attitudes. The implications of these findings for the epistemic function of in-groups are discussed.
Article
This article introduces an individual-difference measure of the need for cognitive closure. As a dispositional construct, the need for cognitive closure is presently treated as a latent variable manifested through several different aspects, namely, desire for predictability, preference for order and structure, discomfort with ambiguity, decisiveness, and close-mindedness. This article presents psychometric work on the measure as well as several validation studies including (a) a «known-groups» discrimination between populations assumed to differ in their need for closure, (b) discriminant and convergent validation with respect to related personality measures, and (c) replication of effects obtained with situational inductions of the need for closure
Article
There needs to be a balance between maintaining the strictest statistical controls and allowing researchers some flexibility to pursue analysis of unexpected trends observed in a study beyond the limits of pre-registered primary analysis. Given a particular data set, it can seem entirely appropriate to look at the data and construct reasonable rules for data exclusion, coding, and analysis that can lead to statistical significance. In such a case, researchers need to perform only one test, but that test is conditional on the data. If data are gathered with no preconceptions at all, statistical significance can obviously be obtained even from pure noise by the simple means of repeatedly performing comparisons, excluding data in different ways, examining different interactions, controlling for different predictors, and so forth.