The unbearable asymmetry of bullshit

Article (PDF Available) · February 2016with4,010 Reads
Abstract
In this essay, I discuss the problem of plausible-sounding bullshit in science, and I describe one particularly insidious method for producing it. Because it takes so much more energy to refute bullshit than it does to create it, and because bullshit can be so damaging to the integrity of empirical research as well as to the policies that are based upon such research, I suggest that addressing this issue should be a high priority for publication ethics.
1 Figures

Full-text (PDF)

Available from: Brian D. Earp, May 28, 2016
Earp BD. The unbearable asymmetry of bullshit. HealthWatch Newsletter 2016;101:4-5
THE UNBEARABLE ASYMMETRY OF BULLSHIT
In this piece, Brian Earp discusses the problem of plausible-sounding bullshit in science, and describes one particularly insidious method
for producing it. Because, he says, it takes so much more energy to refute bullshit than it does to create it, and because the result can be so
damaging to the integrity of empirical research as well as to the policies that are based upon such research, Earp suggests that addressing
this issue should be a high priority for publication ethics
In other words, science is flawed. And scientists are people too.
While it is true that most scientists—at least the ones I know and
work with—are hell-bent on getting things right, they are not there-
fore immune from human foibles. If they want to keep their jobs, at
least, they must contend with a perverse ‘publish or perish’ incentive
structure that tends to reward flashy findings and high-volume ‘pro-
ductivity’ over painstaking, reliable research.12 On top of that, they
have reputations to defend, egos to protect, and grants to pursue. They
get tired. They get overwhelmed. They don’t always check their ref-
erences, or even read what they cite.13 They have cognitive and emo-
tional limitations, not to mention biases, like everyone else.14-16
At the same time, as the psychologist Gary Marcus has recently
put it,17 “it is facile to dismiss science itself. The most careful sci-
entists, and the best science journalists, realize that all science is
provisional. There will always be things that we haven’t figured out
yet, and even some that we get wrong.” But science is not just about
conclusions, he argues, which are occa-
sionally (or even frequently)1incorrect.
Instead, “It’s about a methodology for
investigation, which includes, at its
core, a relentless drive towards ques-
tioning that which came before.” You
can both “love science,” he concludes,
“and question it.”
I agree with Marcus. In fact, I agree with him so much that I
would like to go a step further: if you love science, you had better
question it, and question it well, so it can live up to its potential.
And it is with that in mind that I bring up the subject of bullshit.
There is a veritable truckload of bullshit in science.1* When I say
bullshit, I mean arguments, data, publications, or even the official
policies of scientific organizations that give every impression of
being perfectly reasonable—of being well-supported by the highest
quality of evidence, and so forth—but which don’t hold up when
you scrutinize the details. Bullshit has the veneer of truth-like plau-
sibility. It looks good. It sounds right. But when you get right down
to it, it stinks.
THERE ARE many ways to produce scientific bullshit.18 One
way is to assert that something has been ‘proven’, ‘shown’,
or ‘found’, and then cite, in support of this assertion, a study
that has actually been heavily critiqued (fairly and in good faith, let
us say, although that is not always the case, as we soon shall see)
without acknowledging any of the published criticisms of the study
or otherwise grappling with its inherent limitations.19
Another way is to refer to evidence as being of ‘high quality’ sim-
ply because it comes from an in-principle relatively strong study
design, like a randomized control trial, without checking the specif-
ic materials that were used in the study to confirm that they were fit
for purpose.20 There is also the problem of taking data that were
generated in one environment and applying them to a completely
different environment (without showing, or in some cases even
attempting to show, that the two environments are analogous in the
right way).21 There are other examples I have explored in other con-
texts,18 and many of them are fairly well-known.
But there is one example I have only recently come across, and
of which I have not yet seen any serious discussion. I am referring
to a certain sustained, long-term publication strategy, apparently
deliberately carried out (although motivations can be hard to pin
down), that results in a stupefying, and in my view dangerous,
paper-pile of scientific bullshit. It can be hard to detect, at first, with
an untrained eye—you have to know your specific area of research
extremely well to begin to see it—but once you do catch on, it
becomes impossible to un-see.
I don’t know what to call this insidious tactic (although I will
describe it in just a moment). But I can identify its end result, which
I suspect researchers of every stripe will be able to recognize from
their own sub-disciplines: it is the hyper-partisan and polarized,22-23
but by all outward appearances, dispassionate and objective, ‘sys-
tematic review’ of a controversial subject.
To explain how this tactic works, I am going make up a hypo-
thetical researcher who engages in it,
and walk you through his ‘process’,
step by step. Let’s call this hypothetical
researcher Lord Voldemort. While
everything I am about to say is based
on actual events, and on the real-life
behavior of actual researchers, I will
not be citing any specific cases (to
avoid the drama). Moreover, we should be very careful not to con-
fuse Lord Voldemort for any particular individual. He is an amal-
gam of researchers who do this; he is fictional.
In this story, Lord Voldemort is a prolific proponent of a certain
controversial medical procedure, call it X, which many have argued
is both risky and unethical. It is unclear whether Lord Voldemort
has a financial stake in X, or some other potential conflict of inter-
est. But in any event he is free to press his own opinion. The prob-
lem is that Lord Voldemort doesn’t play fair. In fact, he is so intent
on defending this hypothetical intervention that he will stop at noth-
ing to flood the literature with arguments and data that appear to
weigh decisively in its favor.
As the first step in his long-term strategy, he scans various schol-
arly databases. If he sees any report of an empirical study that does
not put X in an unmitigatedly positive light, he dashes off a letter-
to-the-editor attacking the report on whatever imaginable grounds.
Sometimes he makes a fair point—after all, most studies do have
limitations (see above)—but often what he raises is a quibble,
couched in the language of an exposé.
These letters are not typically peer-reviewed (which is not to say
that peer review is an especially effective quality control mecha-
nism);24-25 instead, in most cases, they get a cursory once-over by an
editor who is not a specialist in the area. Since journals tend to print
the letters they receive unless they are clearly incoherent or in some
way obviously out of line (and since Lord Voldemort has mastered the
art of using ‘objective’ sounding scientific rhetoric26 to mask objec-
tively weak arguments and data), they end up becoming a part of the
published record with every appearance of being legitimate critiques.
The subterfuge does not end there.
“I suspect researchers of every stripe will be
able to recognize it—the hyper-partisan and
polarized, but by all outward appearances,
dispassionate and objective, ‘systematic
review’ of a controversial subject.”
SCIENCE and medicine have done a lot for the world. Diseases have been eradicated, rockets have been sent to the moon, and
convincing, causal explanations have been given for a whole range of formerly inscrutable phenomena. Notwithstanding recent
concerns about sloppy research, small sample sizes, and challenges in replicating major findings1-3—concerns I share and which
I have written about at length4-10—I still believe that the scientific method is the best available tool for getting at empirical truth.11 Or
to put it a slightly different way (if I may paraphrase Winston Churchill’s famous remark about democracy): it is perhaps the worst
tool, except for all the rest.
Published by HealthWatch
www.healthwatch-uk.org
* There is a lot of non-bullshit in science, too!
The next step is for our anti-hero to write a ‘systematic review’ at
the end of the year (or, really, whenever he gets around to it). In it,
He Who Shall Not Be Named predictably rejects all of the studies
that do not support his position as being ‘fatally flawed,’ or as hav-
ing been ‘refuted by experts’—namely, by himself and his close
collaborators, typically citing their own contestable critiques—
while at the same time he fails to find any flaws whatsoever in stud-
ies that make his pet procedure seem on balance beneficial.
The result of this artful exercise is a heavily skewed benefit-to-
risk ratio in favor of X, which can now be cited by unsuspecting
third-parties. Unless you know what Lord Voldemort is up to, that
is, you won’t notice that the math has been rigged.
SO WHY doesn’t somebody put a stop to all this? As a matter
of fact, many have tried. More than once, the Lord
Voldemorts of the world have been called out for their under-
handed tactics, typically in the ‘author reply’ pieces rebutting their
initial attacks. But rarely are these ripostes—constrained as they are
by conventionally miniscule word limits, and buried as they are in
some corner of the Internet—noticed, much less cited in the wider
literature. Certainly, they are far less visible than the ‘systematic
reviews’ churned out by Lord Voldemort and his ilk, which consti-
tute a sort of ‘Gish Gallop’ that can be hard to defeat.
The term ‘Gish Gallop’ is a useful one to know. It was coined by
the science educator Eugenie Scott in the 1990s to describe the
debating strategy of one Duane Gish.27 Gish was an American bio-
chemist turned Young Earth creationist, who often invited main-
stream evolutionary scientists to spar with him in public venues. In
its original context, it meant to “spew forth torrents of error that the
evolutionist hasn’t a prayer of refuting in the format of a debate.” It
also referred to Gish’s apparent tendency to simply ignore objec-
tions raised by his opponents.
A similar phenomenon can play out in debates in medicine. In the
case of Lord Voldemort, the trick is to unleash so many fallacies,
misrepresentations of evidence, and other misleading or erroneous
statements—at such a pace, and with such little regard for the
norms of careful scholarship and/or charitable academic dis-
course—that your opponents, who do, perhaps, feel bound by such
norms, and who have better things to do with their time than to
write rebuttals to each of your papers, face a dilemma. Either they
can ignore you, or they can put their own research priorities on hold
to try to combat the worst of your offenses.
It’s a lose-lose situation. Ignore you, and you win by default.
Engage you, and you win like the pig in the proverb who enjoys
hanging out in the mud.
As the programmer Alberto Brandolini is reputed to have said:28
“The amount of energy necessary to refute bullshit is an order of
magnitude bigger than to produce it.” This is the unbearable asym-
metry of bullshit I mentioned in my title, and it poses a serious
problem for research integrity. Developing a strategy for overcom-
ing it, I suggest, should be a top priority for publication ethics.
Brian D Earp
Visiting Scholar, The Hastings Center Bioethics Research
Institute (Garrison, NY),
and Research Associate, University of Oxford
A modified version of this essay was published in the online magazine Quillette on February 15, 2016. Please note that the article as it
appears here is the ‘original’ (i.e., the final and definitive version), and should therefore be referred to in case of any discrepancies.
The author thanks Morgan Firestein and Diane O’Leary for feedback on an earlier draft of this manuscript.
Published by HealthWatch
www.healthwatch-uk.org
References
1. Ioannidis JP. Why most published research findings are false. PLoS
Medicine 2005;2(8):e124
2. Button KS et al. Power failure: why small sample size undermines the
reliability of neuroscience. Nature Reviews Neuroscience
2013;14(5):365-376
3. Open Science Collaboration. Estimating the reproducibility of psycho-
logical science. Science 2015;349(6251):aac4716
4. Earp BD, Trafimow D. Replication, falsification, and the crisis of confi-
dence in social psychology. Frontiers in Psychology 2015;6(621):1-11
5. Earp BD et al. Out, damned spot: can the “Macbeth Effect” be replicat-
ed? Basic and Applied Social Psychology 2014;36(1):91-98
6. Earp BD. Psychology is not in crisis? Depends on what you mean by
“crisis.” Huffington Post, 2 Sept 2015
http: // ww w.huff in gtonp os t. co m/ brian-e ar p/ ps ychol og y- is -n ot-in -
crisis_b_8077522.html
7. Earp BD, Everett JAC. How to fix psychology’s replication crisis.
Chronicle of Higher Education, 25 Oct 2015 http://chronicle.com/arti-
cle/How-to-Fix-Psychology-s/233857
8. Earp BD. Open review of the draft paper, “Replication initiatives will
not salvage the trustworthiness of psychology” by James C Coyne. BMC
Psychology, 2016 [in press]
https://www.academia.edu/21711738/Open_review_of_the_draft_paper
_enti tled _Replicat ion_initiativ es_will_not_s alvage_the_tr ustworthi-
ness_of_psychology_by_James_C._Coyne
9. Everett JAC, Earp BD. A tragedy of the (academic) commons: interpret-
ing the replication crisis in psychology as a social dilemma for early-
career researchers. Frontiers in Psychology 2015;6(1152):1-4.
10. Trafimow D, Earp BD. Badly specified theories are not responsible for the
replication crisis in psychology. Theory & Psychology 2016; [in press]
https://www.academia.edu/18975122/Badly_specified_theories_are_not
_responsible_for_the_replication_crisis_in_social_psychology
11. Earp BD. Can science tell us what’s objectively true? The New
Collection 2011;6(1):1-9
12. Nosek BA et al. Scientific utopia II. Restructuring incentives and prac-
tices to promote truth over publishability. Perspectives on Psychological
Science 2012;7(6):615-631
13. Rekdal OB. Academic urban legends. Social Studies of Science
2014;44(4):638-654
14. Peterson D. The baby factory: difficult research objects, disciplinary
standards, and the production of statistical significance. Socius 2016 [in
press] http://srd.sagepub.com/content/2/2378023115625071.full
15. Duarte JL et al. Political diversity will improve social psychological sci-
ence. Beha vioral and Bra in Sciences 20 15 [in pr ess]
http ://e milkirkegaard.d k/en /wp-content/upl oads /Political- Dive rsity-
Will-Improve-Social-Psychological-Science-1.pdf
16. Ball P. The trouble with scientists. Nautilus, 14 May 2015 http://nau-
til.us/issue/24/error/the-trouble-with-scientists
17. Marcus G. Science and its skeptics. The New Yorker, 6 Nov 2013
http://www.newyorker.com/tech/elements/science-and-its-skeptics
18. Earp BD. Mental shortcuts [unabridged version]. The Hastings Center
Report 2016 [in press] https://www.resear chgate.net/publi catio n/-
292148550_Mental_shortcuts_unabridged
19. Ioannidis JP. Limitations are not properly acknowledged in the scientif-
ic literature. Journal of Clinical Epidemiology 2007;60(4):324-329
20. Earp BD. Sex and circumcision. American Journal of Bioethics
2015;15(2):43-45
21. Bundick S. Promoting infant male circumcision to reduce transmission of
HIV: A flawed policy for the US. Health and Human Rights Journal Blog,
31 Aug 2009 http://www.hhrjournal.org/2009/08/promoting-infant-male-
circumcision-to-reduce-transmission-of-hiv-a-flawed-policy-for-the-us/
22. Ploug T, Holm S. Conflict of interest disclosure and the polarisation of
scientific communities. Journal of Medical Ethics 2015;41(4):356-358.
23. Earp BD. Addressing polarisation in science. Journal of Medical Ethics
2015;41(9):782-784
24. Smith R. Peer review: a flawed process at the heart of science and jour-
nals. Journal of the Royal Society of Medicine 2006;99(4):178-182
25. Smith R. Classical peer review: an empty gun. Breast Cancer Research
2010;12(S4):1-4
26. Roland MC. Publish and perish: hedging and fraud in scientific dis-
course. EMBO Reports 2007;8(5):424-428
27. Scott E. Debates and the globetrotters. The Talk Origins Archive. 1994
http://www.talkorigins.org/faqs/debating/globetrotters.html
28. Brandolini A. The bullshit asymmetry principle. Lecture delivered at
XP2014 in R ome and at ALE2014 in K rakow. 2014
htt p:/ /ww w.sl ide share.ne t/z iobrando /bu lshit-as ymm etry-pri nci ple-
lightning-talk.
Earp BD. The unbearable asymmetry of bullshit. HealthWatch Newsletter 2016;101:4-5
  • [Show abstract] [Hide abstract] ABSTRACT: There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.
    Full-text · Article · Sep 2005
  • [Show abstract] [Hide abstract] ABSTRACT: A study with low statistical power has a reduced chance of detecting a true effect, but it is less well appreciated that low power also reduces the likelihood that a statistically significant result reflects a true effect. Here, we show that the average statistical power of studies in the neurosciences is very low. The consequences of this include overestimates of effect size and low reproducibility of results. There are also ethical dimensions to this problem, as unreliable research is inefficient and wasteful. Improving reproducibility in neuroscience is a key priority and requires attention to well-established but often ignored methodological principles.
    Article · Apr 2013
  • [Show abstract] [Hide abstract] ABSTRACT: Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Replication effects were half the magnitude of original effects, representing a substantial decline. Ninety-seven percent of original studies had statistically significant results. Thirty-six percent of replications had statistically significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and if no bias in original results is assumed, combining original and replication results left 68% with statistically significant effects. Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.
    Full-text · Article · Aug 2015
  • [Show abstract] [Hide abstract] ABSTRACT: The (latest) crisis in confidence in social psychology has generated much heated discussion about the importance of replication, including how it should be carried out as well as interpreted by scholars in the field. For example, what does it mean if a replication attempt “fails”—does it mean that the original results, or the theory that predicted them, have been falsified? And how should “failed” replications affect our belief in the validity of the original research? In this paper, we consider the replication debate from a historical and philosophical perspective, and provide a conceptual analysis of both replication and falsification as they pertain to this important discussion. Along the way, we highlight the importance of auxiliary assumptions (for both testing theories and attempting replications), and introduce a Bayesian framework for assessing “failed” replications in terms of how they should affect our confidence in original findings.
    Full-text · Article · May 2015
  • [Show abstract] [Hide abstract] ABSTRACT: Zhong and Liljenquist (2006) reported evidence of a “Macbeth Effect” in social psychology: a threat to people's moral purity leads them to seek, literally, to cleanse themselves. In an attempt to build upon these findings, we conducted a series of direct replications of Study 2 from Z&L's seminal report. We used Z&L's original materials and methods, investigated samples that were more representative of the general population, investigated samples from different countries and cultures, and substantially increased the power of our statistical tests. Despite multiple good-faith efforts, however, we were unable to detect a “Macbeth Effect” in any of our experiments. We discuss these findings in the context of recent concerns about replicability in the field of experimental social psychology.
    Full-text · Article · Feb 2014
  • [Show abstract] [Hide abstract] ABSTRACT: In the New York Times, psychologist Lisa Feldman Barrett argues that "Psychology Is Not in Crisis." She is responding to the results of a large-scale initiative called the Reproducibility Project, published in Science magazine, which appeared to show that the findings from over 60 percent of a sample of 100 psychology studies did not hold up when independent labs attempted to replicate them. She argues that "the failure to replicate is not a cause for alarm; in fact, it is a normal part of how science works." To illustrate this point, she gives us the following scenario: "Suppose you have two well-designed, carefully run studies, A and B, that investigate the same phenomenon. They perform what appear to be identical experiments, and yet they reach opposite conclusions. Study A produces the predicted phenomenon, whereas Study B does not. We have a failure to replicate. Does this mean that the phenomenon in question is necessarily illusory? Absolutely not. If the studies were well designed and executed, it is more likely that the phenomenon from Study A is true only under certain conditions. The scientist's job now is to figure out what those conditions are, in order to form new and better hypotheses to test." She's making a pretty big assumption here, which is that the studies we're interested in are "well-designed" and "carefully run." But a major reason for the so-called "crisis" in psychology is the fact that a very large number of not-well-designed, and not-carefully-run studies have been making it through peer review for decades.
    Full-text · Article · Sep 2015 · Basic and Applied Social Psychology
Show more