ArticlePDF Available


Behavioral scientists enjoy vast methodological freedom in how they operationalize theoretical constructs. This freedom may promote creativity in designing laboratory paradigms that shed light on real-world phenomena, but it also enables questionable research practices that undercut our collective credibility. Open Science norms impose some discipline but cannot constrain cherry-picking operational definitions that insulate preferred theories from rejection. All too often scholars conduct performative research to score points instead of engaging each other’s strongest arguments—a pattern that allows contradictory claims to fester unresolved for decades. Adversarial collaborations, which call on disputants to co-develop tests of competing hypotheses, are an efficient method of improving our science’s capacity for self-correction and of promoting intellectual competition that exposes false claims. Although individual researchers are often initially reluctant to participate, the research community would be better served by institutionalizing adversarial collaboration into its peer review process.
Keep your enemies close: Adversarial collaborations will improve behavioral science
Cory J Clark
University of Pennsylvania
Thomas Costello
Emory University
Gregory Mitchell
University of Virginia
Philip E Tetlock
University of Pennsylvania
**Target article forthcoming in Journal of Applied Research in Memory and Cognition
© 2022, American Psychological Association. This paper is not the copy of record and may
not exactly replicate the final, authoritative version of the article. Please do not copy or cite
without authors' permission. The final article will be available, upon publication, via its
DOI: 10.1037/mac0000004
Corresponding author:
Name: Cory Clark
Address: 425 S. University Ave, Stephen A. Levin Bldg.
Philadelphia, PA, 19104-6241
Behavioral scientists enjoy vast methodological freedom in how they operationalize theoretical
constructs. This freedom may promote creativity in designing laboratory paradigms that shed
light on real-world phenomena, but it also enables questionable research practices that undercut
our collective credibility. Open Science norms impose some discipline but cannot constrain
cherry-picking operational definitions that insulate preferred theories from rejection. All too
often scholars conduct performative research to score points instead of engaging each other’s
strongest argumentsa pattern that allows contradictory claims to fester unresolved for decades.
Adversarial collaborations, which call on disputants to co-develop tests of competing
hypotheses, are an efficient method of improving our science’s capacity for self-correction and
of promoting intellectual competition that exposes false claims. Although individual researchers
are often initially reluctant to participate, the research community would be better served by
institutionalizing adversarial collaboration into its peer review process.
Keywords: motivated cognition, metascience, adversarial collaboration, research methods,
science reform
Human societies have benefited immensely from scientific progress, enabled by
institutional innovations over the last few centuries that have made science an efficient truth-
discovery enterprise (Pinker, 2018; Strevens, 2020). Scientists compete to discover new
phenomena and test causal hypotheses systematically in experiments. Peer review filters the flow
of scientific reports to improve the ratio of signal to noise in the public record (Shema, 2014).
And a recent innovation, Open Science, promotes transparency, reducing questionable research
practices (QRPs) and allowing third-party scholars to verify findings more easily (e.g.,
McKiernan et al., 2016; Spellman et al., 2017). Empirical insights can then be applied to solve
practical problems and improve global well-being more efficiently than trial-and-error at the
policy level.
But the behavioral sciences are still young. Numerous obstacles to shortening the path
toward truth remain. We are in no position for complacencyas evidenced by the continuous
flow of reports that prominent findings do not replicate, promising interventions do not work,
and that scholars have used deceptive techniques to exaggerate or fabricate results (Camerer et
al., 2018; Ebersole et al., 2020; Ioannidis, 2012; Nosek et al., 2021; Open Science Collaboration,
2015; Simmons et al., 2011; Simmons & Simonsohn, 2017; Simonsohn et al., 2014; Singal,
2021; Vazire, 2018). Although Open Science constrains some deceptive techniques, scholars still
have vast freedom to operationalize variables and fashion methodological procedures to confirm
desired hypotheses, particularly in fields with many defensible ways of operationalizing
variables and testing hypotheses (Flake & Fried, 2020).
This latitude allows rival scholars, who claim to be investigating the same phenomenon
(e.g., aggression, inequality, jealousy, bias), to invent and rely on distinctive methods of
hypothesis testing that confirm their contradictory hypotheses. Cohorts of rival scholars often
talk past one another and dismiss alternative approaches, showing scant interest in finding
common ground with critics (Tetlock & Levi, 1982; Tetlock & Manstead, 1985; Costello et al.,
2021). They develop auxiliary hypotheses to explain away opponents findings, often rendering
their own hypothesis unfalsifiable (Lakatos, 1970) and causing conceptually contradictory
theories to become empirically indistinguishable (Tetlock & Levi, 1982). The numerous flaws of
the peer review system (e.g., opacity and lack of accountability, the singular authority of editors
to select reviewers and make publication decisions) make it easy for incumbent scholarswho
often have professional stakes in the research they evaluateto squelch dissent by inventing
arbitrary post-hoc rationalizations and critiques (Abramowitz et al., 1975; Ernst & Resch, 1994;
Godlee et al., 1998; Koehler, 1993; Mahoney, 1977; Okike et al., 2016; Tomkins et al., 2017).
In principle, a core tenet of science is falsificationism: only after “bending over
backwards” to prove ourselves wrong can we be confident in our hypotheses’ verisimilitude
(Feynman, 1974; Lilienfeld, 2010; Mayo, 2018; Popper, 1935/2002). In practice, though,
scholars mostly work to confirm their hypotheses and to design the best methodological
strategies for doing so (Skitka, 2020). Consequently, scholarly controversies can rage on for
years or decades, with little to no convergenceeach side convinced it is winning, or indeed, has
already won. Ambiguity abounds creating unnecessary and unproductive fractures among the
scientific community and its consumers (e.g., policy makers) and delaying scientific and human
Here, we highlight a better path forward. Disagreeing scholars should work together (and
with neutral third parties) to design mutually-agreed-on tests of competing hypotheses (rather
than tailor tests likely to yield supportive evidence with likeminded collaborators). Kahneman
(2003, 2011) calls such efforts adversarial collaborations, and scholars should engage in them
regularly. Adversarial collaborations require scholars to precisely define terms, identify core
disputes, commit to conditions of falsifiability, and put hypotheses to agreed-on rigorous tests,
thereby stimulating direct competition of ideas and accelerating quality-based natural selection
of science.
We start by laying out our psychological and epistemic assumptions by describing how
the goals of human reasoning intersect with the institutional goals of science and society.
Although certain norms and incentives in science leverage natural drivers of human cognition to
improve public knowledge and policy, countervailing forces make it possible, too often, for
scientists to lapse into self-justifying modes of thinking in which the social goal of appearing
right eclipses the epistemic goal of being right. We then explain why it is in our collective
scientific interest to implement, incentivize, and institutionalize adversarial collaborations.
The Goals of Human Cognition
Our starting assumption is that human cognitionlike all animal cognitionevolved to
promote fitness (Cosmides, 1989). Humans recruit, assimilate, and organize information in ways
that help them survive and reproduce. Recurrent adaptive challenges (e.g., obtaining
nourishment, avoiding danger, seeking acceptance and status within one’s immediate social
group, and balancing risks and rewards in uncertain environments) lead to at least three core
motivational drivers of human reasoning: accuracy goals, social goals, and error-balancing goals.
People function like intuitive scientists when they pursue accuracy goals (Boudry &
Vlerick, 2014; De Cruz et al., 2011). In many circumstances (e.g., appraising danger, obtaining
nourishment), correct beliefs promote fitness-enhancing decisions, and so people pursue good
information (Baumeister et al., 2018) and strive to hold accurate beliefs (Anglin, 2019; Tappin et
al., 2020; Vlasceanu et al., 2021), especially when accuracy is obtainable and consequential for
fitness. Indeed, the scientific enterprise is a testament to humans’ commitment to pursue more
accurate information. And our success as a species can be attributed in significant part to our
ability to understand and manipulate environments to minimize threats to survival (everyday
technical triumphs like learning to heat and insulate dwellings safely and discovering which
medicines ward off illness).
But sometimes social goals supersede accuracy goals (Clark et al., 2019), and people
reason to pursue belonging and status. Religious beliefs provide a straightforward example.
Believing in non-existent metaphysical entities has little fitness impact, but contradicting the
beliefs of one’s inner circle can cut off social opportunities and even lead to imprisonment or
death. Religious accuracy is evolutionarily inconsequential, but heresy can get you killed. Thus,
human reasoning should favor socially advantageous beliefs over socially costly ones
(sometimes, regardless of their accuracy) (e.g., Kunda, 1990). Socially motivated reasoning has
been demonstrated many times (Clark & Winegard, 2020; Haidt, 2001, 2012), for example: (1)
people seek favor by more generously evaluating in-group over outgroup members (e.g.,
Christenson & Kriner, 2017; Claassen & Ensley, 2016; Cohen, 2003; Hawkins & Nosek, 2012;
Kahan et al., 2012), (2) people exaggerate their own social value and downplay their weaknesses
(e.g., Alicke & Govorun, 2005; Brown, 1986; Hoorens, 1993; Sedikides et al., 2003), (3) people
avoid information that challenges in-group views and seek out confirmatory information (e.g.,
DeMarree et al., 2017; Frimer et al., 2017; Stroud 2008, 2010), and (4) people are credulous
toward information that reinforces in-group beliefs and skeptical of information that challenges
them (e.g., Campbell & Kay, 2014; Ditto et al., 2019a, 2019b; Gampa et al., 2019; Kahan et al.,
2017; Lord et al., 1979; Taber & Lodge, 2006). All this suggests that sometimes reasoning is
motivated more by social goals (e.g., rising through the ranks of one’s coalition) than by
But the world is complex. People must balance accuracy goals and social goals by
weighing (consciously or not) their relative risks and rewards. Such trade-offs are central to
error management theory (Haselton & Buss, 2000). Consider the tendency for men to
overestimate women’s sexual interest. The belief “this woman likes me” carries social risk—if
wrong, the man might be rejected. But the reward for being correct is higher. In contrast, the
belief “this woman does not like me” carries little social risk, but if incorrect, the man misses out
on a mating opportunity. The false negative error (missing out on a mate) is costlier than the
false positive error (embarrassment), and so reasoning errs on the side of overestimating
women’s sexual interest. Systematic inaccuracy can have fitness advantages. This means that
biases in human reasoning (a preference for certain conclusions) can be instrumentally rational
(Weber, 1968), but insofar as biases deviate from pure pursuit of accuracy, biases are
epistemically irrational.
Other cognitive constraints also shape reasoning and belief formation. Heuristics help us
make fast, low-effort decisions but at an accuracy price, though how large is much debated
(Gigerenzer & Gaissmaier, 2011; Tversky & Kahneman, 1974). Even when people are primarily
concerned with accuracy, various non-social biases can interfere with this pursuit (Pinker, 2021),
such as the tendency to seek confirmatory over disconfirmatory evidence in hypothesis testing
(Wason & Johnson-Laird, 1972, Mynatt et al., 1977) or to ignore base rates when estimating
probabilities (Kahneman, 2003; Tversky & Kahneman, 1981). Social goals can exacerbate or
even reverse these tendenciesfor example, when conclusions are undesirable (Dawson et al.,
2002) or have social significance (Cosmides & Tooby, 1992), people are likelier to search for
disconfirming evidence, sometimes to the point of excessive skepticism (Ditto & Lopez, 1992;
Taber & Lodge, 2006).
In general, human reasoning should lead to dubious conclusions when accuracy has little
relation to fitness (e.g., broad philosophical-religious beliefs, abstract moral-political viewpoints,
superstitious practices), and when social penalties are high for holding discordant beliefs (Clark
et al., 2015; Ditto et al., 2009; Tetlock, 2003). However, even when accuracy directly impacts
survival (e.g., “will this vaccine reduce my risk of life-threatening illness?”), social goals
sometimes still triumph if the information is too ambiguous or challenging (Kopko et al., 2011;
Munro et al., 2010a, 2010b). When one senses the truth is unknowable, deferring to one’s social
group makes adaptive sense (Fernbach & Light, 2020).
Scientists are Humans
We take it as axiomatic that scientists are constrained by the same cognitive biases,
limitations, and tradeoff calculations as mere mortals (Bowes et al., 2020; Clark et al., 2021a;
Clark & Tetlock, 2021; Clark & Winegard, 2020; Duarte et al., 2015; Faust, 1984; Haidt, 2020;
Lilienfeld et al., 2020; Mahoney, 1976; Proctor & Capaldi, 2012; Redding, 2001; Ritchie, 2020;
Tetlock, 2020; Winegard & Clark, 2020; although see also Lai, 2020; Van Bavel et al., 2020).
One sign that scientists engage in socially motivated research is the replication crisis and
subsequent discovery of widespread p-hacking and other QRPs (Camerer et al., 2018; Ebersole
et al., 2020; Flake & Fried, 2020; Ioannidis, 2012; Nosek et al., 2021; Open Science
Collaboration, 2015; Simmons et al., 2011; Simmons & Simonsohn, 2017; Simonsohn et al.,
2014; Singal, 2021; Vazire, 2018). Since 2012, the field has been rattled by a surge of non-
replications of oft-cited findings, including growth mindset (Bahnk & Vranka, 2017; Rienzo et
al., 2015; Sisk et al., 2018; Stoet & Geary, 2012), power posing (Jonas et al., 2017; Simmons &
Simonsohn, 2017), ego depletion (Hagger et al., 2016), priming (Pashler et al., 2012; Shanks et
al., 2013; Steele, 2014), the influence of incidental disgust on moral evaluations (Landy &
Goodwin, 2015; Jylkkä et al., 2020), the Mozart effect (Pietschnig et al., 2010), mortality
salience effects (Klein et al., 2019; Sætrevik & Sjåstad, 2019), the relation between ovulatory
phase and numerous outcomes (Bleske-Rechek et al., 2011; Hahn et al., 2020; Thomas et al.,
2021; Wood et al., 2014) and the influence of analytic thinking on religious belief (Sanchez et
al., 2017). Numerous in-depth investigations have uncovered questionable analytic techniques
scholars use to generate publication-worthy findings, including running multiple studies and only
writing up the impressive findings, playing the statistical significance lottery by including
multiple dependent variables and only reporting those that “worked, and flat-out fraud by
fabricating data or dropping participants from datafiles for erroneous reasons (Blanton &
Mitchell, 2011; Simonsohn et al., 2021), among other tactics. Original authors often seem
reluctant to change their minds after their work fails to replicate, making original authors and
failed replicators suitable teams for adversarial collaborations (Koole & Lakens, 2012).
Incorruptible truth-seekers entirely committed to accuracy should, by definition, never
mislead their colleagues, students, or society at large. But partial truth-seekers, who also have
social motives such as getting impressive jobs and prestigious awards, would do these kinds of
things (Jussim et al., 2019). This does not mean scientists have bad intentions, only that they are
people, with limited cognitive resources, social concerns, and career aspirations. The replication
crisis revealed high numbers of false positivesscholars were biased against the null
hypothesisbecause doing so garnered them social benefits. We suspect many of these people
genuinely believed in their scholarship and were fooledor at least fooled enoughby their
own faulty research practices (e.g., Simonsohn et al., 2021).
There are numerous other indicators of socially motivated reasoning among scholars.
Recent reports indicate substantial self-censorship among academics (Clark et al., 2021b;
Kaufmann, 2021). Scholarslike other humansare vulnerable to peer pressures, and at
minimum, these influence which empirical beliefs they will discuss openly, if not the empirical
beliefs they hold. Many scholars admit to discriminating against researchers for political reasons
(e.g., Honeycutt & Freberg, 2017; Inbar & Lammers, 2012; Peters et al., 2020), suggesting
political concerns influence their judgments and that they are complicit in creating a high peer-
pressure social environment. This ideological bias may explain why research tends to portray
conservatives more negatively than liberals (Eitan et al., 2018; Tetlock, 2012) and why liberal
scholars find themselves at more prestigious institutions than their less liberal but similarly
productive peers (Rothman et al., 2005). Peer reviewers evaluate research more favorably when
findings support rather than challenge their own theoretical orientations and political views
(Abramowitz et al., 1975; Ernst & Resch, 1994; Koehler, 1993; Mahoney, 1977), ethics
committees evaluate identical research proposals differently depending on the hypothesis (Ceci
et al., 1992), and a recent survey of the Society for Experimental Social Psychology found
evidence of political resistance to certain evolutionary theories (e.g., Buss & von Hippel, 2018;
von Hippel & Buss, 2017).
Scholars, being human, (1) conduct their research in ways not optimally designed to
pursue truth but rather to confirm preferred hypotheses, (2) create social pressures and conform
to social pressures within their own discipline, and (3) evaluate information in ways that
privilege certain conclusions.
Behavioral Sciences and Socially Motivated Research
More than other scientific disciplines, we suspect the behavioral sciences are particularly
vulnerable to socially motivational distortions. First, accuracy consequences for behavioral
scientists are comparatively low. Although many applied behavioral science findings impact
society (e.g., personality and cognitive assessment; non-medicinal mental health interventions;
education interventions; criminal justice interventions; economic policy; research bearing on
legal matters), the behavioral scientists generating the faulty knowledge seldom pay a serious
price (e.g., they are rarely sued for false promises). The consequences of being wrong are mostly
limited to embarrassment.
Second, the behavioral sciences study provocative topics: moral-political issues bearing
on status, relationships, and distributions of resources. Social concerns loom even larger for
applied behavioral science research because its consequences are more visible to the broader
community. Not only might scholars themselves have social desires that influence their
conclusions, but society likely puts special pressures on behavioral scientists to draw socially
desirable conclusions, for example, that beauty is in the eye of the beholder (Widemo & Sæther,
1999), that extreme wealth is bad (Doyle & Stiglitz, 2014), or that intelligence is not that
important (Duckworth, 2006). Only 1-2% of people can be in the top 1-2% of physical
attractiveness or socioeconomic status or intelligence, and so such conclusions would please
most people (see e.g., Ward et al., 2021). And because humans are motivated to strive for self-
improvement, society might urge behavioral scientists to design self-help interventions that
supposedly improve difficult-to-measure, complexly determined life outcomes such as
happiness, energy, concentration, motivation, success, and self-esteem. Behavioral scientists who
give people the information they desire are rewarded with book deals, media appearances, talks,
and other forms of statuswhereas research debunking such findings is often ignored
(Honeycutt & Jussim, 2020).
Third, the behavioral sciences often deal with variables that are abstract, multiply caused,
difficult to measure and manipulate, and open to countless operationalizations (Flake & Fried,
2020). And many psychological constructs cannot be directly observed, so operationalizations
can easily drift away from the original constructs of interest (e.g., Kovera & Evelo, 2021).
Different investigators’ decisions about how to operationalize variables (Schweinsberg et al.,
2021) and analyze data (Silberzahn et al., 2018) can lead to starkly different conclusions (but
also see Auspurg & Brüderl, 2021).
Sorting out causal relations becomes difficult, sometimes impossible. Effect sizes are
frequently so small it is hard to know whether they have any real-world significance (but see
Funder & Ozer, 2019). And it is uncommon for findings to be replicated across diverse
populationsso generalizability is often unknown. Unlike physical laws, human cultures
change: what was true 10 years ago may no longer be true today (see, e.g., reversals in certain
gender biases [Card et al., 2021]). And behavioral science conclusions and constructs often
embed value-laden assumptions. For example, the use of base rates in decision-making might be
labeled an immoral bias among some scholars and rational by others (Tetlock et al., 2000).
In short, the behavioral sciences are a perfect storm for socially motivated research: the
accuracy consequences are typically low because scholars are rarely held accountable for false
claims, yet the social consequences are high because of the tight connection of much work to
issues of public concern, and the topics of study often involve social constructs and latent
variables that lend themselves to alternative operationalizations that cannot be easily compared
to ground truth for validity. The behavioral sciences arguably require more stringent
accountability than sciences that directly influence life or death (e.g., medical research), sciences
that have little relevance to social issues (e.g., astrophysics), and sciences that deal with variables
with well-established operational definitions (e.g., organic chemistry) and that study phenomena
in extremely controlled environments (e.g., particle accelerators).
The Goals of Society and Science
Many organizations today prioritize epistemic goals for technocratic reasons: discovering
the truth is seen as the cost-effective route to solving problems. So, numerous procedures are
already in place to incentivize accuracy (e.g., fact-checking in journalism, liability in medicine,
evidentiary support in courtrooms, peer review in science). But these epistemic-accountability
systems are imperfect because they are controlled by imperfect human beingsand because of
the impossibility of anticipating the shocks and temptations to which the systems will be
There are various philosophical and sociological perspectives on strategies for promoting
scientific progress (e.g., Lakatos, 1970; Merton, 1973; Popper, 1935/2002; Rauch, 2021), but
most scholars agree that the goal of science is to build knowledge about empirical reality and
pursue truth by testing predictions and explanations against data. Even those who argue that a
higher goal of science is to improve human flourishing still prioritize truth insofar as
interventions have a better chance of success when grounded in facts. Unlike individual
scientists, sciencethe institutionprioritizes accuracy and enforces strategies for testing
claims, like experimentation and repeated observation, that not long ago were unfamiliar to
humanity (Strevens, 2020). But there are myriad flaws and countervailing forces that have
enabled the behavioral sciences to produce a great deal of false knowledge, obstruct true
knowledge, and fail to make progress in resolving contradictory claims (Clark et al., 2021a).
Current Norms and Procedures and Their Shortcomings
Current scientific norms recognize that scientists are people and strive to align social
goals with accuracy goals, for example, by portioning prestige to scientists who develop
productive theories. By implicitly acknowledging that scientists want impressive titles and
accolades, science has co-opted human status striving for truth striving. In some disciplines, it
has done so successfully. Science and society have benefited from discoveries and innovations in
countless ways (Pinker, 2018; Strevens, 2020).
But current norms also incentivize scholars to exaggerate the importance of their
findings; they allow scholars to craft predictor and outcome variables in ways that reinforce their
hypotheses and to tell attractive but unwarranted causal stories about correlational data; they
tolerate tendentious narratives applied to ambiguous or complicated evidence; they expect
scholars to work almost exclusively with teams of likeminded scholars and abide scholars’
apparent refusals to work with adversaries and get to the bottom of disagreements; they
encourage sequential debate rather than concurrent dialogue; they are unable to prevent
strawman characterizations of intellectual opponents; and they rarely call on scholars to earnestly
engage with their opponent’s strongest arguments or subject their pet theories to risky tests
(Clark & Winegard, 2020; Kahneman, 2003; Jussim et al., 2019; Meehl, 1978).
Peer review problems. Science has checks and balances to prevent scientists from
baselessly declaring themselves authorities on various matters, acknowledging that it may not be
possible to align the goals of individual humans with those of science entirely. In peer review,
which emerged in the Enlightenment and was firmly institutionalized in the 20th century (Shema,
2014), subject matter experts review manuscripts, point outs flaws, and decide on publishability.
This process likely improves the quality of the literature because authors must satisfy at least a
few peers, and mini-crowdsourcing expertise likely generates insights that individuals missed
(Surowiecki, 2004; van Gelder et al., 2020). But it has numerous flaws, including interrater
reliability that is barely above chance (Bornmann et al., Forscher et al., 2019; Lee et al., 2012),
the singular authority of editors to select the reviewers (often with awareness of reviewers’
predilections), and its failure to prevent the replication crisis. Indeedpeer review may be
partially to blame for the replication crisis. Editors and reviewers accepted low power, small
sample designs and did not insist on transparency in data analysis (Davis et al., 2018), and
tolerated refusals to share data (Wicherts et al., 2006). Peer review is also largely to blame for
widespread publication biases because reviewers are likelier to reject nonsignificant findings
(Atkinson et al., 1982; Emerson et al., 2010; Franco et al., 2014), and authors would often prefer
to publish their non-significant findings if it were possible (Tsou et al., 2014).
Reviewers tend to be more critical of scholarship that challenges their own work than
scholarship that supports it (Ernst & Resch, 1994). Reviewers also suffer from prestige bias,
giving more favorable evaluations when prestigious authors’ names and institutions are known
than when unknown (Okike et al., 2016; Tomkins et al., 2017), and other author identity biases
(Godlee et al., 1998). And a 2003 study found that 55% of authors had been asked to referee a
manuscript that they were not competent to review, with 37% submitting a review despite their
incompetence (Bedeian, 2003).
Frequently, scholars have expertise in an area because they have published many papers
supporting or challenging a theory and have risen to prominence on that basis. Accordingly,
reviewers often have a professional interest in being lenient or critical when evaluating papers in
their wheelhouse (Ernst & Resch, 1994). If editors seek “balance” by selecting both proponents
and opponents of a given theory to serve as reviewers (perhaps the most hopeful explanation for
the low interrater reliability), this creates ambiguous feedback, and editors are left to “choose
sides,” perhaps swayed by their own predilections. Consequently, individual editors, with their
idiosyncratic flaws and motivations, have much discretion over which data and arguments make
it into top journals.
It remains largely unknown how much peer review improves research quality relative to
other options (Elson et al., 2020) and whether the effect size of this improvement justifies the
massive time costs, inefficiencies, and knowledge delays generated by the process. But given the
pervasiveness of false positive findings and contradictory claims, the behavioral sciences need
more effective quality control procedures.
Open science and its limitations. New norms have emerged in the last decade that
constrain scholars’ freedom to exaggerate the significance of their own work. Open Science
practices, such as preregistration of hypotheses, methods, and analyses, reduce researcher
degrees of freedom in the analysis stage and scholars’ ability to claim—after they have seen
resultsthat they predicted those results a priori (Kerr, 1998). By requiring scholars to report
methods in sufficient detail for exact replication by other scholars and to share data and analysis
code publicly, the Open Science movement disincentivizes shady statistical and reporting
practices by increasing the threat of detection. It has become harder for scholars to selectively
omit or alter incongruous findings. And the focus on replicability improves the reliability of
scientific findings: we can be more confident that particular methods produce particular results.
These developments mark a sea change. But they do not address the potent threat to
scientific progress on which we focus here: scientists’ freedom to operationalize theoretical
constructs in ways that load the dice in favor of the theory ostensibly being tested. This
permissiveness threatens the validity of scientific conclusions. Even if particular methods
reliably produce particular results, we do not know whether the theoretical inferences from those
results are valid. To take an extreme example, embedding conservative values such as “hard
work” in symbolic racism” scales (Kinder & Sears, 1981) made it easy to show that
conservatism is a form of racism (Sniderman & Tetlock, 1986). But the empirical validity of this
claim is suspect because it teeters on tautology. The patterns of attitude-behavior correlations
reveal a messier story. On the one hand, low scorers on the scales have the strongest race-based
preferencesand in favor of Black people over White people (Wright et al., 2021)and
research increasingly shows that conservatives treat people of different races more similarly than
do liberals (Clark et al., 2020). On the other hand, liberalism is also associated with stronger
desires to live in racially diverse communities (e.g., Motyl et al., 2020) and stronger
commitments to racial equality (Pew, 2021), highlighting how conclusions can vary as
operationalizations vary.
Studies have also claimed that their measures of anti-Black implicit racial bias predict
anti-Black behavioral discrimination: “As physicians’ prowhite implicit bias increased, so did
their likelihood of treating white patients and not treating black patients with thrombolysis
(Green et al., 2007, p. 1231) and “those who revealed stronger negative attitudes toward Blacks
(vs Whites) on the IAT had more negative social interactions with a Black (vs a White)
experimenter” (McConnell & Leibold, 2001, p. 435). However, these statements obscure the fact
that the observed discrimination was against the majority group by those with a stronger
preference for African Americans (Blanton et al., 2009; Dawson & Arkes, 2009). Low scorers
were more likely to treat the Black than the White patient and had more positive interactions
with the Black than the White experimenter, whereas high scorers demonstrated no racial
preference. Questionable interpretations of reliable findings are not uncommon (Clark &
Tetlock, 2021; Clark & Winegard, 2020; Mitchell & Tetlock, 2009; Purser & Harper, 2020). And
given that scholars often publish conclusions that contradict other published conclusions, we can
be confident that many reliable results produce conclusions of dubious validity.
The current scientific climateeven with Open Science practicesperpetuates reliable
but invalid claims. Advancing scientific debates can be unsettling for scientists accustomed to
performative displays that have little power to gauge the explanatory merits of competing views
but leave face-saving interpretive wiggle room for almost everyone. If scholars holding
contradictory views engaged in true competitions, fewer would emerge full victors (though many
could still be correct within better specified boundary conditions). Science advances by ruling
out incorrect views, yet few scientists contribute to this advance by admitting their own doubts or
mistakes. Normalizing adversarial collaborations could change this.
Adversarial Collaboration: A Gold Standard for Scientific Dispute Resolution
Adversarial collaborations are a method of encouraging scholars who disagree to work
together to resolve their scientific disputes. As conceived by Kahneman (2011), adversarial
collaborations call on scholars to: (1) understand and articulate their opponents’ perspective so
well that each side feels fairly characterized; (2) work together to design mutually agreed upon
studies that have potential to adjudicate competing hypotheses and that they agree, ex ante, could
change their minds; and (3) jointly publish the results, regardless of the outcome. Each
collaborator serves as a check on the other to ensure that methods are not rigged; studies, not
file-drawered; and interpretations, duly circumspect.
Adversarial collaborations are appropriate when scholars disagree over whether a
phenomenon exists, in what contexts it exists, or over how best to explain it. In the business-as-
usual scenario, a scholar would write a critical commentary or conduct a follow-up study to
refute a published claim; in the adversarial-collaboration scenario, the scholar invites the authors
of the original article to work together to clarify the disagreement and resolve it, either
empirically or conceptually. The key in most adversarial collaborations is the collection of new
data, with mutually agreed upon methods for testing which theory makes better predictions. This
process facilitates what Platt (1960) termed strong inference and allows the parties to escape the
traps of selective attention to different findings and selective interpretation of the same findings
that often stalemate scientific debates and that can escalate into accusations of cherry-picking
and bad faith. For classic cases, see the exchanges about childhood abuse and repressed
memories (Alpert et al., 1998a, 1998b; Ornstein et al., 1998a, 1998b) or exchanges about the
usefulness of implicit bias for understanding and eliminating discrimination in organizations
(Jost et al., 2009a, 2009b; Tetlock & Mitchell, 2009a, 2009b, 2009c).
Guidelines for Participation
Although scholars sometimes successfully innovate their own ways of running
adversarial collaborations, we recommend the guidelines in Table 1, which draw on the work of
Mellers and colleagues (2001, p. 270) and Clark and Tetlock (2021, p. 21-22).
Table 1
Guidelines for Participating in Adversarial Collaborations
1. Consider the temperaments of potential adversaries. Some scholars may be able to
participate in adversarial collaborations more successfully than others (e.g., successful adversarial
collaboration may be associated with higher intellectual humility [Bowes et al., 2020; 2021], open-
mindedness, and agreeableness, and with lower dogmatism, neuroticism, narcissism, and ideological
extremism [van Prooijen & Krouwel, 2019; Zmigrod et al., 2020]). For many scientific disputes,
different “sides” are supported by numerous scholars, and so it may be useful to select an adversary
among them who seems capable of carrying out an adversarial collaboration successfully.
2. Involve a trusted, neutral third-party colleague to be a moderator. The moderator should be
mutually agreed upon by all adversaries and will coordinate the effort, referee disagreements, and
collect and analyze the data and write up the results. The data should remain under the control of the
moderator throughout the project. At the outset, the adversaries and the moderator should agree that
the moderator will pursue publication even if one or more adversaries refuses to cooperate and drops
out. (This should also disincentivize “dropping out” because the paper will be published anyway,
and the scholar simply misses out on co-authorship.)
3. An initial discussion should identify a clearly defined disagreement. Both sides should be
able to articulate their own perspective in concrete terms as well as the strongest version of their
adversary’s perspective and the disagreement in terms all parties agree with. This discussion should
leave all parties feeling understood, not caricatured. The moderator should take notes of all
discussions; this allows for records that remind adversaries of their earlier statements and
4. Agree on the details of an initial study designed to subject the opposing claims to an
informative empirical test. The participants should seek to identify results that would change their
mind, at least to some extent, and should explicitly anticipate their interpretations of outcomes that
would be inconsistent with their theoretical expectations.
5. Strive for achievable, incremental progress. Accept in advance that the initial study will be
inconclusive. Allow each side to propose additional experiments to exploit the fount of hindsight
wisdom that commonly becomes available when disliked results are obtained. Additional studies
should be planned jointly, with the moderator resolving disagreements as they occur.
6. Be flexible with collaborators. There is rarely one way to answer a question, so if there is
resistance to one approach, simply move on to a new one. If one study goes awry (i.e., one or more
collaborators are not convinced by the findings), figure out why and fix the ambiguities for the next
7. Take advantage of preregistration. Preregistering an adversarial collaboration can help lock
both scholars into a research plan, which will minimize scholars’ ability to renege if unfavorable
results are found.
8. If significant disagreements remain after all data are collected, write individual discussion
sections. The length of these discussions should be determined in advance and monitored by the
These guidelines anticipate ways in which adversarial collaborations can fail and aim to
pre-empt them. Although some adversarial collaborations proceed harmoniously (e.g., Fiske,
2017), in which case these precautions are overkill, we think it wise to prepare for the worst and
hope for the best. Disputes with high symbolic or policy stakessuch as the controversies over
the accuracy of suppressed memories (e.g., Karon & Widener, 1998; Loftus, 2005; Pendergrast,
1999), the consequences of affirmative action (e.g., Crosby et al., 2006; Sander, 2004), the
causes of gender gaps in STEM (e.g., Cheryan et al., 2017; Williams & Ceci, 2015), racial
disparities in police use of force (e.g., Cesario et al., 2019, Geller et al., 2020; Hollis & Jennings,
2018), or the influence of implicit bias (e.g., Jost et al., 2009a; Tetlock & Mitchell, 2009b)can
quickly become contentious.
How They Will Improve Empirical Accuracy
Adversarial collaborations can highlight perverse scientific normsand motivate
changein a host of ways.
More competition of ideas. Evolutionary epistemology studies the process by which
knowledge is generated through a competition of ideas with selection based on survival of the
truest (Bradie & Harms, 2020). Whereas current norms allow numerous contradictory
hypotheses to co-exist with little convergence over time, adversarial collaboration requires the
two to compete until one wins or both come out modified. Adversarial collaboration provides a
harsher competitive environment with clearer terms of battle, allowing for more rapid and
efficient quality-based ideational selection. Bad ideas will die faster, and good ideas will elevate
with greater clarity and become more refined with each round of collaboration. This will reduce
ambiguity and contradiction in the published literature, equipping other scholars to make better,
more productive hypotheses themselves.
Checks and balances and higher standards. Adversarial collaborations constrain
researcher degrees of freedom throughout the entire research processfrom initial framing of
questions to write-ups of discussion sections. Because methodological designs must be approved
by all parties, parties subject their own hypotheses to a genuinely stringent test: one that their
opponent expects them to fail. Scholars are prevented from rigging the methods in their favor
and designing predictor variables that are confounded with outcome variablesadversaries
would not allow it. Ultimately, by holding one another to the same set of (high) standards,
adversaries will design tests that are fairer, more rigorous, and better able to adjudicate between
the competing hypotheses (Kahneman, 2003).
These checks and balances work more efficiently than peer review, in which flaws are
pointed out when the project is finished, and by over-worked, under-compensated reviewers. The
power asymmetry between authors and reviewers allows harsh reviewers to obstruct a paper
even when they may have approved of the methodological design before seeing the results.
Adversarial collaborations hold authors and hostile reviewers to the same evidentiary standards
and require that all parties commit to those standards ex ante. Registered reportsin which
journals approve of study methods prior to seeing the resultsaccomplish something similar, but
adversarial collaborations go further, incorporating the critic into the research process from start
to finish. And the requirement that both parties commit to publishing the results eliminates the
file-drawer option, which should reduce publication biases in the literature.
Open exchange and allowing data to resolve disputes. Academic debates at
conferences, in journals, and on social media, often come down to clever argumentation.
Scholars deploy data, but often Scholar A claims those data are devastating to Scholar B, and
Scholar B disagrees. Strawman arguments, ad hominem attacks, motte-and-bailey mix ups, and
red herrings are common tacticsand often successful (Pinker, 2021). Adversarial
collaborations short-circuit these repressive tactics and allow disagreements to mature in the
sunlight of open exchange.
Creating epistemic accountability and clarifying disagreements. Adversarial
collaborations require scholars to clarify their own positions and their disagreements. Whereas
current norms incentivize scholars to exaggerate the scope and importance of their hypotheses
(Jussim et al., 2019), adversarial collaborations do the opposite. Scholars, knowing they will
soon be accountable to strong empirical tests, with few retreat options, are incentivized to be pre-
emptively self-critical about limits and boundary conditions (Lerner & Tetlock, 1999).
Adversaries must be able to articulate their opponent’s position to their opponent’s satisfaction
and identify points of actual disagreement, not just perceived disagreement. And they must distill
their disputes into testable propositions. This requires scholars to clear up the ambiguities in their
own thinking and specify data patterns that could falsify their hypotheses or at least modify
In our own adversarial collaborations, we have been surprised by three things:
1. Disagreements are harder to articulate than expected.
2. Disagreements are smaller and more nuanced than expected.
3. Adversaries begin to merge in their perspectives before data collection even begins.
We suspect these surprises occur because scholars often start off with exaggerated views
of their opponent’s perspectives and then moderate their own perspectives when they engage
with the actual, not an imagined, opponent. These conversations alone are critical for finding
common ground. Indeed, in an adversarial review by Kahneman and Klein (2009), two scholars
who were leaders of clashing theoretical camps on intuitive judgment and expertise were
surprised to discover how minor some of their disagreements were.
Even failures will be useful. If an adversarial collaboration fails (e.g., the scholars
cannot agree on their disagreement or to methods, or the results cause a falling-out), these
failures can still be informative. Failure to agree on the disagreement could be a sign there is no
disagreementthe adversaries are using the same vague language to describe two different
phenomena. Similarly, a failure to agree on methods could suggest a lack of conceptual clarity,
and if each side forwarded more precise claims, the disagreement would dissolve (Cowan et al.,
2020). Generally, if proponents and opponents tend to use different methods between groups and
similar methods within groups, this suggests that proponents and opponents know which
methods “work” for their preferred hypotheses and use them precisely for that reason.
Disagreeing scholars might not disagree on which methods produce which results, but on what
those results mean (questions of validity, not reliability), an indication that those methods
produce ambiguous results and are generally not useful for adjudicating the debate. Scholars can
then identify those ambiguities to design better methods (e.g., Is the metric confounded with
something or otherwise imprecise? Are there alternate explanations that need to be ruled out? Is
there a missing moderator?).
Good for Science, and Good for the Scientist
There are many reasons the benefits of adversarial collaborations can far outweigh their
(admittedly higher-than-average) costs for individual scientists, and these benefits could increase
substantially if institutions incentivized them properly. We suspect there are three big barriers to
participation: fear of not confirming one’s hypothesis, concerns about time and effort, and
aversion to interpersonal conflict. These challenges are not insurmountable for most scholars.
Participating in adversarial collaborations might seem risky. Often scholars become
known for a particular idea, one that got them their job, on a Best Seller list, or millions of TED
talk views, not to mention dozens of publications, involving numerous colleagues and protégés.
To put that idea at risk of disconfirmation may seem like a risk of being labeled a charlatan. But
it often will be in scholars best interest to put their own theories to more rigorous tests. In the
worst-case scenario that a theory is completely incorrect, over time, it will fail to make
successful predictions or to deliver the expected impacts, and eventually, someone else will point
out the error (and get credit for doing so). Not rigorously testing it only delays the inevitable.
Scholars who join in dismantling their theoretical framework and publicly change their mind
contribute far more to scientific progress than those who hunker down and stay the courseand
science should recognize and reward that.
But adversarial collaborations will rarely reveal that “Scholar A is 100% right and
Scholar B 100% wrong.” More likely, they will reveal boundary conditions or a moderate
position that falls between the two. And we suspect scholars will enjoy some reputation benefits
for participating in adversarial collaborationsthey signal that one is more interested in
contributing good information than in saving face. The costs are especially low and benefits high
for early career researchers, who have not yet tied their reputations to theories and may use
adversarial collaborations to avoid investments in counterproductive research areas.
Insofar as adversarial collaborators hold each other to higher standards than a scholar
would hold himself or herself, adversarial collaborations may save time at the review stage
flaws will have been identified and corrected before they occur. But still, designing and carrying
out a study with an adversary will almost inevitably take more time and effort than an average
study. We consider this to be an investment in higher quality work, similar to how registered
reports and meta-analyses take additional time but tend to produce more reliable findings than
average studies. More reliable findings that scholars can depend upon to formulate their own
hypotheses should garner more citations in the long-run and have impact for a longer period, just
as Open Science practices benefit scholars with higher citations, media attention, and job and
funding opportunities (McKiernan et al., 2016), and so adversarial collaborations likely make
better long-term career investments.
Perhaps the greatest psychological barrier to adversarial collaborations is aversion to
interpersonal conflict (Ulbig & Funk, 1999). Empirical disagreements can lead to acrimony and
cause long-term awkwardness at conferences and in other professional activities. There is no
simple solution here, but such cases should become the exception rather than the rule (e.g., Fiske,
2017) if scholars follow guidelines, such as avoiding adversaries with reputations as
temperamental or dogmatic. Adversarial collaborations also provide opportunities to resolve
conflict, build relationships, and produce more enlightening exchanges than traditional
commentaries with their all-too-common snark (Kahneman, 2003). In our experience, scholars
have been cordial and accommodating, and early conversations have been intellectually
invigorating. Scholars can increase the odds of this outcome by being courteous themselves.
Some scholars may be too dogmatic or risk averse to participate in adversarial
collaborations, but we suspect (and hope) that most scientists are open-minded enough to work
with colleagues with different views. Normalizing adversarial collaborations would help reveal
which scholars balk, allowing the rest of the scientific community to adjust their confidence in
associated works. However, some topics may be so flammable that few scholars will be willing
to engage. This would be unfortunate, but no worse than the current climate in which scholars
avoid studying topics with high controversy potential (Clark et al., 2021b).
Adversarial collaborations may not be necessary or helpful for research questions where
there are virtually no competing hypotheses, either because there are no hypotheses (e.g.,
exploratory work) or because the phenomenon under investigation is established beyond
reasonable doubt. Outside of these cases, we see no empirical question that is open to competing
plausible interpretations that would not benefit from an adversarial approach.
The Past, Present, and Future of Adversarial Collaboration
In one of the earliest adversarial collaborations, Latham and colleagues (1998) tested
whether setting goals leads to higher goal commitment than having goals assigned. Although the
collaborators did not converge on all points, they did agree on a variety of moderators that likely
explained their different sets of prior results. That same year, Gilovich, Medvec, and Kahneman
(1998) worked together to resolve an earlier dispute regarding regrets for action and inaction.
The former two had argued that action regrets start off intense but fade quickly whereas inaction
regrets linger longer and thus hurt more in the long run (Gilovich & Medvec, 1995). Kahneman
(1995) disagreedarguing that there are different kinds of regret and that inaction regrets are
nostalgic and so not particularly painful compared to hotter and more intense action regrets.
They discovered both sides were partially right (and partially wrong): action regrets did elicit
primarily hot emotions and inaction regrets were sometimes wistful and sometimes more painful.
The first paper to call itself as an adversarial collaboration involved competing
explanations for conjunction fallacies offered by Kahneman and Hertwig (Mellers et al., 2001).
Kahneman proposed that the conjunction fallacy is better viewed as a judgmental error rooted in
over-reliance on simple heuristics, whereas Hertwig proposed that supposedly fallacious
judgments were actually rational responses to conversational norms activated by the presentation
of questions about sets of possibilities (e.g., Linda is a bankteller) and subsets of possibilities
(e.g., Linda is a bankteller and a feminist) (Mellers et al., 2001). Together, they worked out
conditions under which the conjunction fallacy waxes or wanes in strength, explored reasons for
their discordant findings in the past, admitted which findings they had not predicted a priori and
how those findings shifted their understanding of conjunction effects, and identified remaining
empirically testable disagreements.
In another earlier adversarial collaboration, Bateman and colleagues (2005) explored
whether and when people perceive money spent on goods as a loss. Both groups agreed on the
validity of the tests, concluded that money outlays are perceived as losses, identified moderators,
and updated their effect size estimates. Although they did not reach perfect consensus, their
disagreements shrank, and both teams offered new sets of testable explanations for their
remaining disagreements. Adversarial collaborations may rarely produce breakthroughs, but they
do facilitate cumulative exchanges of viewssomething that is frustratingly difficult for editors
to achieve when they supervise scientific back-and-forths (see, e.g., Alpert et al., 1998a, 1998b;
Ornstein et al., 1998a, 1998b).
Some teams have sustained adversarial collaborations for years. One team in Germany
has focused on studies testing theories of consciousness (Melloni et al., 2021). And for over a
decade, skeptics and proponents of psychic ability worked together on a series of studies to
assess whether people can psychically detect when another person is staring at them (e.g., Schlitz
et al., 2006; Wiseman & Schlitz, 1997, 1999). Over the years, some studies yielded significant
effects, others did not (often depending on which team collected the data), and they came to
agree on the findings within individual studies but did not converge. In this case, a third-party
data collector might have helped. Procedural glitch aside, they did publish several joint papers
and identified plausible explanations for their respective findings over the years (Schlitz et al.,
2006; Wiseman & Schlitz, 1997, 1999).
Cowan and colleagues (2020) have been working on a three-way extended adversarial
collaboration on theories of working memory in young adults and cognitive aging (e.g., Doherty
et al., 2019). In addition to developing their own guidelines for successful adversarial
collaboration (Cowan et al., 2020, p. 1015), they noted numerous advantages to the adversarial
approach: (1) by agreeing on a set of methods, they all trusted the results, unlike traditional
disagreements where proponents of different theories use distinct methods and dismiss
opponents’ methods (and hence their results), (2) by accounting for a growing, common set of
results, the theories gradually became more similar, (3) conclusions in the general discussion
were more nuanced than they would have been if one team controlled the interpretations, and (4)
regardless of any disagreements that remained among the main scholars, their collaboratively
published research provided more balanced information for other scholars who are less
committed to a particular view. It is unreasonable to expect one or two new datasets to
drastically alter theoretical predictions grounded in years of previous research. Instead, new
results will likely lead to small adjustments to one or more adversaries’ prior positions. And the
longer adversaries collaborate, simultaneously incorporating numerous identical sets of results
into their own theoretical models, the more their models will converge.
Most adversarial collaborations thus far have explored low-political-controversy topics.
The scholars involved had clashing expectations, but the expectations did not carry a strong
moral-political charge or have obvious policy significance. For example, adversarial teams have
explored the effects of horizontal saccadic eye movements on retrieval of episodic memories
(Matzke et al., 2015), the consequences of repeated rounds and price feedback in second-price
auctions (Corrigan et al., 2012), the mechanisms underlying approach and avoidance instructions
on implicit associations (Van Dessel et al., 2017), moderators of when the minimal group
paradigm leads to ingroup favoritism (Kerr et al., 2018), influences on the shapes of utility and
probability weighting functions (Alempaki et al., 2019), how costs and benefits affect the
voluntary provision of threshold public goods (Cadsby et al., 2008), and the extent to which
people spontaneously differentiate social groups’ warmth/communion vs. agency/competence
and ideological beliefs (Koch et al., 2020).
However, adversarial collaborations on even politically sensitive topics have been
conducted with success. For instance, Stern and Crawford (2021) examined whether liberals and
conservatives exhibit prejudice against those who hold different views on political and non-
political topics. Both authors predicted that the relationships between political dissimilarity and
prejudice would be symmetrically strong among liberals and conservatives. But for nonpolitical
dissimilarity, one author predicted the relationships would be stronger for conservatives than
liberals, whereas the other predicted symmetry. In contrast to either of their expectations, the
relationship between political dissimilarity and prejudice tended to be stronger among liberals
than conservatives (although the relationship was not always significant). In the non-political
domain, the results were quite ambiguous, with some studies and measures of prejudice showing
no interaction (consistent with the symmetry prediction) and others showing a significant
interaction with the relationship being stronger among conservatives (consistent with the
asymmetry prediction). The authors concluded, together, that the relationships may vary and that
any effect of asymmetry may be small. This is progress.
Despite a veneer of enthusiasm about adversarial collaboration, and little to no public
criticism of the approach, they have yet to be widely adopted. Figure 1 compares a google search
for “adversarial collaboration” starting in 2001 (when the first self-declared “adversarial
collaboration” was published) to a search for “preregistered study” starting in 2013 (when
Gelman and Loken introduced that term). Scholars have been far faster to adopt the latter.
Figure 1. Google Scholar search results for ‘Adversarial Collaboration’ and ‘Preregistered Study’
by year since their first introduction to the literature (as of November, 2021)
We can think of a few reasons scholars have been slower to adopt adversarial
collaborations. Both adversarial collaborations and preregistered studies restrict researcher
degrees of freedom in a way that scholars seeking hypothesis-confirmation might find
inconvenient, but adversarial collaborations are more restrictive. Preregistration can be done in
minutes whereas adversarial collaborations require numerous negotiations over days, weeks, and
months, and whereas like-minded collaborators can expedite hypothesis testing and
confirmation, adversarial collaborators slow things down, with every step of the research process
requiring more thought and care. Preregistration requires giving up a bit of freedom and power in
the least creative parts of the research process: data collection and analyses. Adversarial
collaboration requires giving up freedom and power in the study design phase. Precisely because
study design is so critical to hypothesis confirmation, people are probably reluctant to let
'Adversarial Collaboration', 'Preregistered study' by 'Year'
Adversarial Collaboration Preregistered study
someone else get their hands on the reinsespecially someone who does not share their research
agenda. And, preregistration has received institutional backing, with journals giving badges and
other benefits to scholars who preregister their studies. Adversarial collaborations have received
little institutional support.
One journal, Thinking & Reasoning, posted an editorial in 2015 requesting adversarial
collaboration submissions (with a submission process similar to registered reports; Rakow et al.,
2015). However, to date, the journal has had no takers, and one collaboration that an editor tried
to organize failed because the adversaries could not agree on the research question. This lack of
uptake is understandable, given the greater effort required and the degree of freedom scholars
must give up for the greater good of science. To balance these costs, institutions that depend on
scientists for accurate information should incentivize adversarial collaborations.
The Future
In early 2021, we launched an initiative at the University of Pennsylvania, the
Adversarial Collaboration Project, which supports adversarial collaborations across a variety of
ongoing scientific disputes. Thus far, we are supporting nine projects, involving nearly four
dozen scholars. Some of the issues are not particularly contentious outside of the laboratory. For
instance, with Jon Haidt, Peter Ditto, Dave Rand, and Gordon Pennycook, we are exploring the
extent to which reasoning is socially motivated. But some do touch on more contentious topics.
With Jay Van Bavel and Jarret Crawford, we are testing for political bias in the psychology
literature; with Luke Conway, Chadly Stern, Jan-Willem van Prooijen, and Madalina Vlasceanu,
we are exploring whether political conservatism is associated with cognitive rigidity; and with a
large team of collaborators, we are testing whether behavioral scientists systematically self-
censor their empirical beliefs.
Assembling teams of open-minded scholars with differing perspectives can lead to
progress on seemingly intractable debates. The behavioral sciences are plagued by zombie ideas
(Barrett, 2019; Krugman, 2013), decades-old controversies, and popular theories that are so
vague as to render them unfalsifiable. The ensuing debates, often stalemated, counterproductive,
confusing, and costly, decorate the pages of peer-reviewed journals and undergraduate textbooks.
Table 2 lists a few dozen contenders that would make appropriate adversarial collaborations if
the warring parties were willing. This list is by no means exhaustive, but it may be a helpful
starting place.
We can only imagine the progress that could have been made on these issues had the
scholars on various sides worked together from the discovery of their disagreements rather than
continued to defend their perspectives for decades on end. The widespread implementation of
adversarial collaborations would transform many hundreds or thousands of debates currently
unfolding in the behavioral sciences.
Incentivizing Scholars to Seek Truth
The best way to promote accuracy goals among behavioral scientists is to align social
goals with epistemic goals. Many scholars have relinquished some of their researcher degrees of
freedom with open science practices because such practices are rewarded with more favorable
evaluations in review, on the job market, and in the eyes of peers (McKiernan et al., 2016), and
because journals punish not participating by rejecting papers that do not use open science
practices. Scholars would be more willing to pay the price of adversarial collaborations if such
efforts were similarly rewarded and if standard performative research practices were rejected.
Professional organizations and universities could reward participation with more
favorable evaluations in awards, hiring, and promotion decisions. And given the time costs for
all participants in adversarial collaborations, adversarial collaboration publications could be
given similar weight as first authored papers. Indeed, adversarial collaborations inevitably will
be more of a collaborative team effort than traditional approaches in which the lead author
frequently does 75%-95% of the work. This would help remove barriers to participation for
untenured faculty.
When a paper claims to oppose or challenge another hypothesis or theory, editors could
insist on adversarial collaborations. In the long run, this approach would benefit journals because
it would produce higher quality science that will be more helpful to other scholars (and thus
should get cited more often). Top journals could host annual special issues of adversarial
collaborations that are accepted based on registered reports. If established journals are satisfied
with the status quo, newer journals could distinguish themselves by focusing on adversarial
collaborations, incentivizing them by focusing peer review on recommendations for
improvement rather than acceptance versus rejection (similar to journals that solicit proposals,
such as Current Directions journals) or by making them eligible for immediate editorial decision.
Generally, peer review will be less necessary for adversarial collaborations because peer review
is built into the research process, and editors will know that at least one or more hostile reviewers
has already reviewed the paper (and much more thoroughly than a 500-word critique) and signed
off. Adversarial collaborations might improve the validity and quality of research better than
peer review, and in a way that is more efficient, less biased, less likely to delay progress, and
more rewarding for the reviewers (because they are co-authors).
Certain kinds of organizations may have similar social motives as individual scientists
and wish to publicize their involvement in particular research agendas for reputational and
political reasons, and those organizations might prefer to fund performative research over
accuracy-seeking research. But many organizations care about solving societal problems and
designing effective policy, and these organizations likely do care first and foremost about
accuracy. When such funders put out their calls for submissions, they could require that all
submissions be adversarial collaborations (at least where appropriate). Just as participation in
adversarial research might signal which scholars care about truth over advancing their own
careers, funding adversarial research might signal which organizations care about problem-
solving and effective policy over advancing their own political reputations and agendas.
As Mellers et al. (2001, p. 275) noted: In an ideal world, scholars would feel obliged to
accept an offer of adversarial collaboration. Editors would require adversaries to collaborate
prior to, or instead of, writing independent exchanges. Scientific meetings would allot time for
scholars engaged in adversarial collaboration to present their joint findings. In short, adversarial
collaboration would become the norm, not the exception.More generally, if our knowledge-
production systems capitalized on researcher disagreements to build more nuanced consensuses
instead of perpetuating and polarizing disagreements, we would be better situated, as a society,
to advance evidence-based approaches to collective problems.
Adversarial collaborations invigorate the spirit of falsificationism that prominent
philosophers of science have long promoted. Many scientists seek to protect their research as
much as they seek truth, and the freedom to design research in ways that avoid risky testing of
theories allows unnecessary debates to continue. Normalizing adversarial collaborations could
promote a scientific climate in which status-truth trade-offs disappear—and updating one’s
empirical beliefs is not viewed as a sign of failure and foolishness but of integrity and progress.
Tetlock and Mitchell (2009b) have commented that adversarial collaboration is most
needed and least feasible in domains in which “the scientific community lacks clear criteria for
falsifying points of view, disagrees on key methodological issues, relies on second- or third-best
substitute methods for testing causality, and is fractured into opposing camps that engage in ad
hominem posturing and have intimate ties to political actors who see any concession as
weakness” (p. 31). Put differently, the more contentious the policy debates, and the more
imprecise the science and contradictory the conclusions in the literature, the greater will be both
the potential yield from adversarial collaborations and the reluctance of scholars to participate.
Nonetheless, if any approach has a chance to move the needle on these difficult debates, it will
be getting scholars to swallow their pride and earnestly engage their intellectual adversaries.
Adversarial collaborations look risky partly because they are unfamiliar, and scholars are
so rarely called upon to acknowledge error in the ways that other professionals routinely must:
engineers, lawyers, athletes, doctors, and detectives. But discovering one’s weaknesses is a
crucial part of learning. The so-called “soft” sciences are hard—and it would be astonishing if
the true error rates were not much higher than the self-acknowledged error rates. Adversarial
collaboration will shrink the gap between those numbers, improving empirical accuracy for
individual scholars and expediting progress for science and society at large.
Author Contributions
Cory Clark wrote the original draft. Thomas Costello, Gregory Mitchell, and Philip Tetlock
provided many helpful comments and changes.
This research was funded in part by the Searle Freedom Trust (PD 10080850). The funding
source had no involvement in the research or preparation of the manuscript.
We would like to thank Jon Haidt, Stephen Ceci, and one anonymous reviewer, as well as our
editor, Qi Wang, for the many useful suggestions for revising our manuscript.
Abramowitz, S. I., Gomes, B., & Abramowitz, C. V. (1975). Publish or politic: Referee bias in
manuscript review. Journal of Applied Social Psychology, 5, 187-200.
Alempaki, D., Canic, E., Mullett, T. L., Skylark, W. J., Starmer, C., Stewart, N., & Tufano, F.
(2019). Reexamining how utility and weighting functions get their shapes: A quasi-
adversarial collaboration providing a new interpretation. Management Science, 65(10),
Alicke, M.D., Govorun, O. (2005). The better-than-average effect. In Alicke, M.D., Dunning,
D.A., Krueger, J.I. (Eds.), Studies in self and identity (pp. 85106). New
York: Psychology Press.
Alpert, J. L., Brown, L. S., Ceci, S. J., Courtois, C. A., Loftus, E. F., & Ornstein, P. A. (1998a).
Final conclusions of the American Psychological Association working group on
investigation of memories of childhood abuse. Psychology Public Policy and Law, 4(4),
Alpert, J. L., Brown, L. S., & Courtois, C. A. (1998b). Comment on Ornstein, Ceci, and Loftus
(1998): Adult recollections of childhood abuse. Psychology, Public Policy, and Law,
4(4), 10521067.
Anglin, S. M. (2019). Do beliefs yield to evidence? Examining belief perseverance vs. change in
response to congruent empirical findings. Journal of Experimental Social Psychology, 82,
Atkinson, D. R., Furlong, M. J., & Wampold, B. E. (1982). Statistical significance, reviewer
evaluations, and the scientific process: Is there a (statistically) significant
relationship?. Journal of Counseling Psychology, 29(2), 189.
Auspurg, K., & Brüderl, J. (2021). Has the Credibility of the Social Sciences Been Credibly
Destroyed? Reanalyzing the “Many Analysts, One Data Set” Project. Socius, 7,
Bahnk, Š., & Vranka, M. A. (2017). Growth mindset is not associated with scholastic aptitude in
a large sample of university applicants. Personality and Individual Differences, 117, 139-
Barrett, L. F. (2019). Zombie ideas. APS Observer, 32(8).
Bateman, I., Kahneman, D., Munro, A., Starmer, C., & Sugden, R. (2005). Testing competing
models of loss aversion: An adversarial collaboration. Journal of Public Economics,
89(8), 1561-1580.
Baumeister, R. F., Maranges, H. M., & Vohs, K. D. (2018). Human self as information agent:
Functioning in a social environment based on shared meanings. Review of General
Psychology, 22(1), 36-47.
Bedeian, A. G. (2003). The manuscript review process: The proper roles of authors, referees, and
editors. Journal of Management Inquiry, 12(4), 331-338.
Blanton, H., Jaccard, J., Klick, J., Mellers, B., Mitchell, G., & Tetlock, P. E. (2009). Strong
claims and weak evidence: reassessing the predictive validity of the IAT. Journal of
Applied Psychology, 94(3), 567.
Blanton, H., & Mitchell, G. (2011). Reassessing the predictive validity of the IAT II: Reanalysis
of Heider & Skowronski (2007). North American Journal of Psychology, 12(1), 99-106.
Bleske-Rechek, A., Harris, H. D., Denkinger, K., Webb, R. M., Erickson, L., & Nelson, L. A.
(2011). Physical cues of ovulatory status: A failure to replicate enhanced facial
attractiveness and reduced waist-to-hip ratio at high fertility. Evolutionary
Psychology, 9(3), 147470491100900306.
Boudry, M., & Vlerick, M. (2014). Natural selection does care about truth. International Studies
in the Philosophy of Science, 28(1), 65-77.
Bornmann, L., Mutz, R., & Daniel, H. D. (2010). A reliability-generalization study of journal
peer reviews: A multilevel meta-analysis of inter-rater reliability and its
determinants. PloS one, 5(12), e14331.
Bowes, S. M., Ammirati, R. J., Costello, T. H., Basterfield, C., & Lilienfeld, S. O. (2020).
Cognitive biases, heuristics, and logical fallacies in clinical practice: A brief field guide
for practicing clinicians and supervisors. Professional Psychology: Research and
Practice, 51(5), 435445.
Bowes, S. M., Blanchard, M. C., Costello, T. H., Abramowitz, A. I., & Lilienfeld, S. O. (2020).
Intellectual humility and between-party animus: Implications for affective polarization in
two community samples. Journal of Research in Personality, 88, 103992.
Bowes, S. M., Costello, T. H., Lee, C., McElroy-Heltzel, S., Davis, D. E., & Lilienfeld, S. O.
(2021). Stepping Outside the Echo Chamber: Is Intellectual Humility Associated with
Less Political Myside Bias?. Personality and Social Psychology Bulletin.
Bradie, M., & Harm, W. (2020). Evolutionary epistemology. In E.N. Zalta (Ed.), The Stanford
Encyclopedia of Philosophy. Retrieved from
Brown, J. D. (1986). Evaluations of self and others: Self-enhancement biases in social
judgments. Social Cognition, 4(4), 353-376.
Buss, D. M., & von Hippel, W. (2018). Psychological barriers to evolutionary psychology:
Ideological bias and coalitional adaptations. Archives of Scientific Psychology, 6(1),
Cadsby, C. B., Croson, R., Marks, M., & Maynes, E. (2008). Step return versus net reward in the
voluntary provision of a threshold public good: An adversarial collaboration. Public
Choice, 135(3), 277-289.
Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T. H., Huber, J., Johannesson, M., ... & Wu, H.
(2018). Evaluating the replicability of social science experiments in Nature and Science
between 2010 and 2015. Nature Human Behaviour, 2(9), 637-644.
Campbell, T. H., & Kay, A. C. (2014). Solution aversion: On the relation between ideology and
motivated disbelief. Journal of Personality and Social Psychology, 107(5), 809-824.
Card, D., DellaVigna, S., Funk, P., & Iriberri, N. (2021). Gender differences in peer recognition
by economists. Unpublished manuscript.
Ceci, S. J., Peters, D., & Plotkin, J. (1985). Human subjects review, personal values, and the
regulation of social science research. American Psychologist, 40, 9941002.
Cesario, J., Johnson, D. J., & Terrill, W. (2019). Is there evidence of racial disparity in police use
of deadly force? Analyses of officer-involved fatal shootings in 20152016. Social
Psychological and Personality Science, 10(5), 586-595.
Cheryan, S., Ziegler, S. A., Montoya, A. K., & Jiang, L. (2017). Why are some STEM fields
more gender balanced than others?. Psychological Bulletin, 143(1), 1-35.
Christenson, D. P., & Kriner, D. L. (2017). Constitutional qualms or politics as usual? The
factors shaping public support for unilateral action. American Journal of Political
Science, 61(2), 335-349.
Clark, C. J., Chen, E. E., & Ditto, P. H. (2015). Moral coherence processes: Constructing
culpability and consequences. Current Opinion in Psychology, 6, 123-128.
Clark, C. J., Honeycutt, N., & Jussim, L. (2021a). Replicability and the psychology of science. In
S. Lilienfeld, A. Masuda, & W. O’Donohue (Eds.), Questionable Research Practices in
Psychology. New York: Springer.
Clark, C. J., Liu, B. S., Winegard, B. M., & Ditto, P. H. (2019). Tribalism is human nature.
Current Directions in Psychological Science, 28(6), 587-592.
Clark, C. J., Fjeldmark, M., Lu, L., Baumeister, R. F., Ceci, S., German, K., Reilly, W., Tice, D.,
von Hippel, W., Williams, W., Winegard, B. M., & Tetlock, P. E. (2021b). Taboos and
self-censorship in the social sciences [Unpublished manuscript]. Department of
Psychology, University of Pennsylvania, Philadelphia, PA.
Clark, C. J., & Tetlock, P. E. (2021). Adversarial collaboration: The next science reform. In C. L.
Frisby, R. E. Redding, W. T. O’Donohue, & S. O. Lilienfeld (Eds.), Political Bias in
Psychology: Nature, Scope, and Solutions. New York: Springer.
Clark, C. J., & Winegard, B. M. (2020). Tribalism in war and peace: The nature and evolution of
ideological epistemology and its significance for modern social science. Psychological
Inquiry, 31(1), 1-22.
Clark, C. J., Winegard, B. M., & Farkas, D. (2020). A cross-cultural analysis of censorship on
campuses. [Unpublished manuscript]. Department of Psychology, Durham University,
Durham, UK.
Claassen, R. L., & Ensley, M. J. (2016). Motivated reasoning and yard-sign-stealing partisans:
Mine is a likable rogue, yours is a degenerate criminal. Political Behavior, 38(2), 317-
Cohen, G. L. (2003). Party over policy: The dominating impact of group influence on political
beliefs. Journal of Personality and Social Psychology, 85(5), 808-822.
Corrigan, J. R., Drichoutis, A. C., Lusk, J. L., Nayga Jr, R. M., & Rousu, M. C. (2012). Repeated
rounds with price feedback in experimental auction valuation: An adversarial
collaboration. American Journal of Agricultural Economics, 94(1), 97-115.
Cosmides, L. (1989). The logic of social exchange: Has natural selection shaped how humans
reason? Studies with the Wason selection task. Cognition, 31(3), 187-276.
Cosmides, L. , & Tooby, J. (1992). Cognitive adaptations for social exchange. In Barkow, J. ,
Cosmides, L. , & Tooby, J. , The adapted mind: Evolutionary psychology and the
generation of culture (pp. 162228). New York: Oxford University Press.
Costello, T. H., Clark, C. J., & Tetlock, P. E. (in press). Shoring up the shaky psychological
foundations of a micro-Economic model of ideology: Adversarial collaboration solutions.
Commentary on The market for belief systems: A formal model of ideological choice (T.
Gries, V. Müller, J. T. Jost), Psychological Inquiry.
Cowan, N., Belletier, C., Doherty, J. M., Jaroslawska, A. J., Rhodes, S., Forsberg, A., ... &
Logie, R. H. (2020). How do scientific views change? Notes from an extended
adversarial collaboration. Perspectives on Psychological Science, 15(4), 1011-1025.
Crosby, F. J., Iyer, A., & Sincharoen, S. (2006). Understanding affirmative action. Annual
Review of Psychology., 57, 585-611.
Davis, W. E., Giner-Sorolla, R., Lindsay, D. S., Lougheed, J. P., Makel, M. C., Meier, M. E., ...
& Zelenski, J. M. (2018). Peer-review guidelines promoting replicability and
transparency in psychological science. Advances in Methods and Practices in
Psychological Science, 1(4), 556-573.
Dawson, N. V., & Arkes, H. R. (2009). Implicit bias among physicians. Journal of General
Internal Medicine, 24(1), 137-140.
Dawson, E., Gilovich, T., & Regan, D. T. (2002). Motivated reasoning and performance on the
Wason Selection Task. Personality and Social Psychology Bulletin, 28(10), 1379-1387.
De Cruz, H., Boudry, M., De Smedt, J., & Blancke, S. (2011). Evolutionary approaches to
epistemic justification. Dialectica, 65(4), 517-535.
DeMarree, K. G., Clark, C. J., Wheeler, S. C., Briñol, P., & Petty, R. E. (2017). On the pursuit of
desired attitudes: Wanting a different attitude affects information processing and
behavior. Journal of Experimental Social Psychology, 70, 129-142.
Ditto, P. H., Liu, B. S., Clark, C. J., Wojcik, S. P., Chen, E. E., Grady, R. H., ... & Zinger, J. F.
(2019a). At least bias is bipartisan: A meta-analytic comparison of partisan bias in
liberals and conservatives. Perspectives on Psychological Science, 14(2), 273-291.
Ditto, P. H., Clark, C. J., Liu, B. S., Wojcik, S. P., Chen, E. E., Grady, R. H., ... & Zinger, J. F.
(2019b). Partisan bias and its discontents. Perspectives on Psychological Science, 14(2),
Ditto, P. H., & Lopez, D. F. (1992). Motivated skepticism: Use of differential decision criteria
for preferred and nonpreferred conclusions. Journal of Personality and Social
Psychology, 63(4), 568-584.
Ditto, P. H., Pizarro, D. A., & Tannenbaum, D. (2009). Motivated moral reasoning. Psychology
of Learning and Motivation, 50, 307-338.
Doherty, J. M., Belletier, C., Rhodes, S., Jaroslawska, A., Barrouillet, P., Camos, V., ... & Logie,
R. H. (2019). Dual-task costs in working memory: An adversarial collaboration. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 45(9), 1529-1551.
Doyle, M. W., & Stiglitz, J. E. (2014). Eliminating extreme inequality: A sustainable
development goal, 20152030. Ethics & International Affairs, 28(1), 5-13.
Duarte, J. L., Crawford, J. T., Stern, C., Haidt, J., Jussim, L., & Tetlock, P. E. (2015). Political
diversity will improve social psychological science 1. Behavioral and Brain Sciences, 38,
Duckworth, A. L. (2006). Intelligence is not enough: Non-IQ predictors of achievement.
University of Pennsylvania.
Ebersole, C. R., Mathur, M. B., Baranski, E., Bart-Plange, D. J., Buttrick, N. R., Chartier, C. R.,
... & Szecsi, P. (2020). Many Labs 5: Testing pre-data-collection peer review as an
intervention to increase replicability. Advances in Methods and Practices in
Psychological Science, 3(3), 309-331.
Elson, M., Huff, M., & Utz, S. (2020). Metascience on Peer Review: Testing the Effects of a
Study’s Originality and Statistical Significance in a Field Experiment. Advances in
Methods and Practices in Psychological Science, 3(1), 53-65.
Emerson, G. B., Warme, W. J., Wolf, F. M., Heckman, J. D., Brand, R. A., & Leopold, S. S.
(2010). Testing for the presence of positive-outcome bias in peer review: a randomized
controlled trial. Archives of Internal Medicine, 170(21), 1934-1939.
Ernst, E., & Resch, K. L. (1994). Reviewer bias: a blinded experimental study. The Journal of
Laboratory and Clinical Medicine, 124(2), 178-182.
Faust, D. (1984). The limits of scientific reasoning. Minneapolis: University of Minnesota Press.
Fernbach, P. M., & Light, N. (2020). Knowledge is Shared. Psychological Inquiry, 31(1), 26-28.
Feynman, R. P. (1974). Cargo cult science. Engineering and Science, 37(7), 10-13.
Fiske, S. T. (2017). Going in many right directions, all at once. Perspectives on Psychological
Science, 12(4), 652-655.
Flake, J. K., & Fried, E. I. (2020). Measurement schmeasurement: Questionable measurement
practices and how to avoid them. Advances in Methods and Practices in Psychological
Science, 3(4), 456-465.
Forscher, P. S., Cox, W. T. L., Devine, P. G., Brauer, M. (2019). How many reviewers are
required to obtain reliable evaluations of NIH R01 grant proposals? PsyArXiV. Retrieved
Franco, A., Malhotra, N., & Simonovits, G. (2014). Publication bias in the social sciences:
Unlocking the file drawer. Science, 345(6203), 1502-1505.
Frimer, J. A., Skitka, L. J., & Motyl, M. (2017). Liberals and conservatives are similarly
motivated to avoid exposure to one another's opinions. Journal of Experimental Social
Psychology, 72, 1-12.
Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psychological research: Sense and
nonsense. Advances in Methods and Practices in Psychological Science, 2(2), 156-168.
Gampa, A., Wojcik, S. P., Motyl, M., Nosek, B. A., & Ditto, P. H. (2019). (Ideo) logical
reasoning: Ideology impairs sound reasoning. Social Psychological and Personality
Science, 10(8), 1075-1083.
Geller, A., Goff, P. A., Lloyd, T., Haviland, A., Obermark, D., & Glaser, J. (2020). Measuring
racial disparities in police use of force: methods matter. Journal of Quantitative
Criminology, 1-31.
Gigerenzer, G., & Gaissmaier, W. (2011). Heuristic decision making. Annual Review of
Psychology, 62, 451-482.
Gilovich, T., & Medvec, V. H. (1995). The experience of regret: what, when, and
why. Psychological Review, 102(2), 379-395.
Gilovich, T., Medvec, V. H., & Kahneman, D. (1998). Varieties of regret: A debate and partial
resolution. Psychological Review, 105(3), 602-605.
Godlee, F., Gale, C. R., & Martyn, C. N. (1998). Effect on the quality of peer review of blinding
reviewers and asking them to sign their reports: a randomized controlled
trial. Jama, 280(3), 237-240.
Green, A. R., Carney, D. R., Pallin, D. J., Ngo, L. H., Raymond, K. L., Iezzoni, L. I., & Banaji,
M. R. (2007). Implicit bias among physicians and its prediction of thrombolysis decisions
for black and white patients. Journal of General Internal Medicine, 22(9), 1231-1238.
Hagger, M. S., Chatzisarantis, N. L., Alberts, H., Anggono, C. O., Batailler, C., Birt, A. R., ... &
Zwienenberg, M. (2016). A multilab preregistered replication of the ego-depletion
effect. Perspectives on Psychological Science, 11(4), 546-573.
Hahn, A. C., DeBruine, L. M., Pesce, L. A., Diaz, A., Aberson, C. L., & Jones, B. C. (2020).
Does women’s anxious jealousy track changes in steroid hormone
levels?. Psychoneuroendocrinology, 113, 104553.
Haidt, J. (2001). The emotional dog and its rational tail: A social intuitionist approach to moral
judgment. Psychological Review, 108(4), 814834.
Haidt, J. (2012). The righteous mind: Why good people are divided by politics and religion. New
York, NY: Vintage.
Haidt, J. (2020). Tribalism, forbidden baserates, and the telos of social science. Psychological
Inquiry, 31(1), 53-56.
Haselton, M. G., & Buss, D. M. (2000). Error management theory: a new perspective on biases
in cross-sex mind reading. Journal of Personality and Social Psychology, 78(1), 81-91.
Hawkins, C. B., & Nosek, B. A. (2012). Motivated independence? Implicit party identity
predicts political judgments among self-proclaimed independents. Personality and Social
Psychology Bulletin, 38(11), 1437-1452.
Heyman, T., Moors, P., & Rabagliati, H. (2020). The benefits of adversarial collaboration for
commentaries. Nature Human Behaviour, 4(12), 1217-1217.
Hollis, M. E., & Jennings, W. G. (2018). Racial disparities in police use-of-force: a state-of-the-
art review. Policing: An International Journal, 41, 178-193.
Honeycutt, N., & Freberg, L. (2017). The liberal and conservative experience across academic
disciplines: An extension of Inbar and Lammers. Social Psychological and Personality
Science, 8, 115123.
Honeycutt, N., & Jussim, L. (2020). A model of political bias in social science
research. Psychological Inquiry, 31(1), 73-85.
Hoorens, V. (1993). Self-enhancement and superiority biases in social comparison. European
Review of Social Psychology, 4(1), 113-139.
Inbar, Y., & Lammers, J. (2012). Political diversity in social and personality psychology.
Perspectives on Psychological Science, 7, 496-503.
Ioannidis, J. P. (2012). Why science is not necessarily self-correcting. Perspectives on
Psychological Science, 7(6), 645-654.
Jonas, K. J., Cesario, J., Alger, M., Bailey, A. H., Bombari, D., Carney, D., ... & Tybur, J. M.
(2017). Power poseswhere do we stand?. Comprehensive Results in Social Psychology,
2(1), 139-141.
Jost, J. T., Rudman, L. A., Blair, I. V., Carney, D. R., Dasgupta, N., Glaser, J., & Hardin, C. D.
(2009a). The existence of implicit bias is beyond reasonable doubt: A refutation of
ideological and methodological objections and executive summary of ten studies that no
manager should ignore. Research in Organizational Behavior, 29, 39-69.
Jost, J. T., Rudman, L. A., Blair, I. V., Carney, D. R., Dasgupta, N., Glaser, J., & Hardin, C. D.
(2009b). An invitation to Tetlock and Mitchell to conduct empirical research on implicit
bias with friends,“adversaries,” or whomever they please. Research in Organizational
Behavior, (29), 73-75.
Jussim, L., Careem, A., Goldberg, Z., Honeycutt, N., & Stevens, S. (in press). IAT scores, racial
gaps, and scientific gaps. In J. A. Krosnick, T. H. Stark, & A. L. Scott (Eds.), The Future
of Research on Implicit Bias.
Jussim, L., Krosnick, J. A., Stevens, S. T., & Anglin, S. M. (2019). A social psychological model
of scientific practices: Explaining research practices and outlining the potential for
successful reforms. Psychologica Belgica, 59(1), 353-372.
Jylkkä, J., Härkönen, J., & Hyönä, J. (2021). Incidental disgust does not cause moral
condemnation of neutral actions. Cognition and Emotion, 35(1), 96-109.
Kahan, D. M., Hoffman, D. A., Braman, D., Evans, D., & Rachlinski, J. J. (2012). They saw a
protest: Cognitive illiberalism and the speech-conduct distinction. Stan. L. Rev., 64, 851.
Kahan, D. M., Peters, E., Dawson, E. C., & Slovic, P. (2017). Motivated numeracy and
enlightened self-government. Behavioural Public Policy, 1(1), 5486.
Kahneman, D. (1995). Varieties of counterfactual thinking. In Roese, N. J., Olson, J. M. (Eds.),
What might have been: The social psychology of counterfactual thinking (pp. 375
396). Mahwah, NJ: Erlbaum.
Kahneman, D. (2003). A perspective on judgment and choice: mapping bounded
rationality. American Psychologist, 58(9), 697-720.
Kahneman, D. (2003). Experiences of collaborative research. American Psychologist, 58(9), 723-
Kahneman, D. (2011). Thinking, fast and slow. New York: Farrar, Straus, and Giroux.
Kaufmann, E. (2021). Academic freedom in crisis: Punishment, political discrimination, and
self-censorship. Center for the Study of Partisanship and Ideology, 2, 1-195.
Kahneman, D., & Klein, G. (2009). Conditions for intuitive expertise: a failure to
disagree. American Psychologist, 64(6), 515-526.
Karon, B. P., & Widener, A. (1998). Repressed memories: The real story. Professional
Psychology: Research and Practice, 29(5), 482487.
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social
Psychology Review, 2(3), 196-217.
Kerr, N. L., Ao, X., Hogg, M. A., & Zhang, J. (2018). Addressing replicability concerns via
adversarial collaboration: Discovering hidden moderators of the minimal intergroup
discrimination effect. Journal of Experimental Social Psychology, 78, 66-76.
Kinder, D. R., & Sears, D. O. (1981). Prejudice and politics: Symbolic racism versus racial
threats to the good life. Journal of Personality and Social Psychology, 40(3), 414-431.
Klein, R. A., Cook, C. L., Ebersole, C. R., Vitiello, C. A., Nosek, B. A., Chartier, C. R., . . .
Ratliff, K. A. (2019). Many Labs 4: Failure to replicate mortality salience effect with and
without original author involvement. doi:10.31234/
Koch, A., Imhoff, R., Unkelbach, C., Nicolas, G., Fiske, S., Terache, J., ... & Yzerbyt, V. (2020).
Groups' warmth is a personal matter: Understanding consensus on stereotype dimensions
reconciles adversarial models of social evaluation. Journal of Experimental Social
Psychology, 89, 103995.
Koehler, J. J. (1993). The influence of prior beliefs on scientific judgments of evidence quality.
Organizational Behavior and Human Decision Processes, 56, 28-55.
Koole, S. L., & Lakens, D. (2012). Rewarding replications: A sure and simple way to improve
psychological science. Perspectives on Psychological Science, 7(6), 608-614.
Kopko, K. C., Bryner, S. M., Budziak, J., Devine, C. J., & Nawara, S. P. (2011). In the eye of the
beholder? Motivated reasoning in disputed elections. Political Behavior, 33(2), 271-290.
Kovera, M. B., & Evelo, A. J. (2021). Eyewitness identification in its social context. Journal of
Applied Research in Memory and Cognition.
Krugman, P. (2013, February 14). Rubio and the zombies. The New York Times. Retrieved from
Kunda, Z. (1990). The case for motivated reasoning. Psychological Bulletin, 108(3), 480-498.
Lai, C. K. (2020). Ordinary claims require ordinary evidence: a lack of direct support for
equalitarian bias in the social sciences. Psychological Inquiry, 31(1), 42-47.
Lakatos, I. (1970). Falsification and the methodology of scientific research programmes. In I.
Lakatos & A. Musgrave (Eds.), Criticism and the growth of knowledge (pp. 205259).
New York, NY: Cambridge University
Landy, J. F., & Goodwin, G. P. (2015). Does incidental disgust amplify moral judgment? A
meta-analytic review of experimental evidence. Perspectives on Psychological
Science, 10(4), 518-536.
Latham, G. P., Erez, M., & Locke, E. A. (1988). Resolving scientific disputes by the joint design
of crucial experiments by the antagonists: Application to the ErezLatham dispute
regarding participation in goal setting. Journal of Applied Psychology, 73(4), 753-772.
Lee, C. J., Sugimoto, C. R., Zhang, G., & Cronin, B. (2013). Bias in peer review. Journal of the
American Society for Information Science and Technology, 64, 2-17.
Lerner, J. S., & Tetlock, P. E. (1999). Accounting for the effects of accountability. Psychological
Bulletin, 125(2), 255-275.
Lilienfeld, S. O. (2010). Can psychology become a science?. Personality and Individual
Differences, 49(4), 281-288.
Lilienfeld, S. O., Basterfield, C., Bowes, S. M., & Costello, T. H. (2020). Nobelists gone wild:
Case studies in the domain specificity of critical thinking. In R. J. Sternberg & D. F.
Halpern (Eds.), Critical thinking in psychology (2nd ed.). New York, NY: Cambridge
University Press.
Loftus, E. F. (2005). Planting misinformation in the human mind: A 30-year investigation of the
malleability of memory. Learning & Memory, 12(4), 361-366.
Lord, C. G., Ross, L., & Lepper, M. R. (1979). Biased assimilation and attitude polarization: The
effects of prior theories on subsequently considered evidence. Journal of Personality and
Social Psychology, 37(11), 20982109.
Mahoney, M. J. (1976). Scientist as subject: The psychological imperative. Ballinger.
Mahoney, M. J. (1977). Publication prejudices: An experimental study of confirmatory bias in
the peer review system. Cognitive Therapy and Research, 1, 161-175.
Matzke, D., Nieuwenhuis, S., van Rijn, H., Slagter, H. A., van der Molen, M. W., &
Wagenmakers, E. J. (2015). The effect of horizontal eye movements on free recall: A
preregistered adversarial collaboration. Journal of Experimental Psychology: General,
144(1), e1.
Mayo, D. G. (2018). Statistical inference as severe testing. Cambridge: Cambridge University
McConnell, A. R., & Leibold, J. M. (2001). Relations among the Implicit Association Test,
discriminatory behavior, and explicit measures of racial attitudes. Journal of
Experimental Social Psychology, 37(5), 435-442.
McKiernan, E. C., Bourne, P. E., Brown, C. T., Buck, S., Kenall, A., Lin, J., ... & Yarkoni, T.
(2016). Point of view: How open science helps researchers succeed. Elife, 5, e16800.
Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow
progress of soft psychology. Journal of Consulting and Clinical Psychology, 46(4), 806-
Mellers, B., Hertwig, R., & Kahneman, D. (2001). Do frequency representations eliminate
conjunction effects? An exercise in adversarial collaboration. Psychological Science,
12(4), 269-275.
Melloni, L., Mudrik, L., Pitts, M., & Koch, C. (2021). Making the hard problem of
consciousness easier. Science, 372(6545), 911-912.
Merton, R. K. (1973). The sociology of science: Theoretical and empirical investigations.
Chicago, IL: University of Chicago Press.
Mitchell, G., Tetlock, P. E. (2009). Disentangling reasons and rationalizations: Exploring
perceived fairness in hypothetical societies. In Jost, J., Kay, A. C., Thorisdottir, H. (Eds.),
Social and psychological bases of ideology and system justification (pp. 126157). New
York, NY: Oxford University Press.
Motyl, M., Prims, J. P., & Iyer, R. (2020). How ambient cues facilitate political
segregation. Personality and Social Psychology Bulletin, 46(5), 723-737.
Munro, G. D., Lasane, T. P., & Leary, S. P. (2010a). Political partisan prejudice: Selective
distortion and weighting of evaluative categories in college admissions applications.
Journal of Applied Social Psychology, 40(9), 2434-2462.
Munro, G. D., Weih, C., & Tsai, J. (2010b). Motivated suspicion: Asymmetrical attributions of
the behavior of political ingroup and outgroup members. Basic and Applied Social
Psychology, 32(2), 173-184.
Mynatt, C. R., Doherty, M. E., & Tweney, R. D. (1977). Confirmation bias in a simulated
research environment: An experimental study of scientific inference. Quarterly Journal
of Experimental Psychology, 29(1), 85-95.
Nier, J. A., & Campbell, S. D. (2013). Two outsiders’ view on feminism and evolutionary
psychology: An opportune time for adversarial collaboration. Sex Roles, 69(9-10), 503-
Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Almenberg, A. D., ... &
Vazire, S. (2021). Replicability, robustness, and reproducibility in psychological science.
Unpublished manuscript.
Okike, K., Hug, K. T., Kocher, M. S., & Leopold, S. S. (2016). Single-blind vs double-blind peer
review in the setting of author prestige. Jama, 316(12), 1315-1316.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science.
Science, 349(6251).
Ornstein, P. A., Ceci, S. J., & Loftus, E. F. (1998a). Comment on Alpert, Brown, and Courtois
(1998): The science of memory and the practice of psychotherapy. Psychology, Public
Policy, and Law, 4(4), 996-1010.
Ornstein, P. A., Ceci, S. J., & Loftus, E. F. (1998b). More on the repressed memory debate: A
reply to Alpert, Brown, and Courtois (1998). Psychology, Public Policy, and Law, 4(4),
Pashler, H., Coburn, N., & Harris, C. R. (2012). Priming of social distance? Failure to replicate
effects on social and food judgments. PLoS ONE 7(8): e42510.
Pendergrast, M. (1999). Smearing in the name of scholarship. Professional Psychology:
Research and Practice, 30(6), 623625.
Peters, U., Honeycutt, N., Block, A. D., & Jussim, L. (2020). Ideological diversity, hostility, and
discrimination in philosophy. Philosophical Psychology, 138.
Pew (2021). Beyond red vs. blue: The political typology. Retrieved from:
Pietschnig, J., Voracek, M., & Formann, A. K. (2010). Mozart effectShmozart effect: A meta-
analysis. Intelligence, 38(3), 314-323.
Pinker, S. (2018). Enlightenment now: The case for reason, science, humanism, and progress.
New York, NY: Penguin.
Pinker, S. (2021). Rationality: What it is, why it seems scare, why it matters. New York, NY:
Platt, J. (1998). A history of sociological research methods in America, 1920-1960 (Vol. 40).
New York, NY: Cambridge University Press.
Popper, K. (2002). The logic of scientific discovery (Popper, K., Trans.). New York, NY:
Routledge Classics. (Original work published 1935)
Proctor, R. W., & Capaldi, E. J. (Eds.). (2012). Psychology of science: Implicit and explicit
processes. New York: Oxford University Press.
Purser, H., & Harper, C. A. (2020). Low system justification drives ideological differences in
joke perception: A critical commentary and re-analysis of Baltiansky et al. (2020).
Unpublished manuscript.
Rakow, T., Thompson, V., Ball, L., & Markovits, H. (2015). Rationale and guidelines for
empirical adversarial collaboration: A Thinking & Reasoning initiative. Thinking &
Reasoning, 21, 167-175.
Rauch, J. (2021). The Constitution of Knowledge: A Defense of Truth. Washington D.C.:
Brookings Institution Press.
Redding, R. E. (2001). Sociopolitical diversity in psychology: The case for pluralism. American
Psychologist, 56(3), 205.
Rienzo, C., Rolfe, H., & Wilkinson, D. (2015). Changing mindsets: Evaluation report and
executive summary. Education Endowment Foundation.
Ritchie, S. (2020). Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the
Search for Truth. New York, NY: Metropolitan Books.
Rothman, S., Lichter, S. R., & Nevitte, N. (2005). Politics and professional advancement among
college faculty. The Forum, 3.
Sætrevik, B., & Sjåstad, H. (2019, May 17). Failed pre-registered replication of mortality
salience effects in traditional and novel measures.
Sanchez, C., Sundermeier, B., Gray, K., & Calin-Jageman, R. J. (2017). Direct replication of
Gervais & Norenzayan (2012): No evidence that analytic thinking decreases religious
belief. PLoS One, 12(2), e0172636.
Sander, R. H. (2004). A systemic analysis of affirmative action in American law
schools. Stanford Law Review., 57, 367-483.
Schlitz, M., Wiseman, R., Watt, C., & Radin, D. (2006). Of two minds: Sceptic‐proponent
collaboration within parapsychology. British Journal of Psychology, 97(3), 313-322.
Schweinsberg, M., Feldman, M., Staub, N., van den Akker, O. R., van Aert, R. C., Van Assen,
M. A., ... & Schulte-Mecklenbeck, M. (2021). Same data, different conclusions: Radical
dispersion in empirical results when independent analysts operationalize and test the
same hypothesis. Organizational Behavior and Human Decision Processes.
Sedikides, C., Gaertner, L., & Toguchi, Y. (2003). Pancultural self-enhancement. Journal of
Personality and Social Psychology, 84(1), 60-79.
Shanks, D. R., Newell, B. R., Lee, E. H., Balakrishnan, D., Ekelund, L., Cenac, Z., ... & Moore,
C. (2013). Priming intelligent behavior: An elusive phenomenon. PloS one, 8(4), e56515.
Shema, H. (2014). The birth of modern peer review. Scientific American, 156-60.
Silberzahn, R., Uhlmann, E. L., Martin, D. P., Anselmi, P., Aust, F., Awtrey, E., ... & Nosek, B.
A. (2018). Many analysts, one data set: Making transparent how variations in analytic
choices affect results. Advances in Methods and Practices in Psychological Science, 1(3),
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed
flexibility in data collection and analysis allows presenting anything as significant.
Psychological Science, 22(11), 1359-1366.
Simmons, J. P., & Simonsohn, U. (2017). Power posing: P-curving the evidence. Psychological
Science, 28, 687-693.
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: a key to the file-drawer.
Journal of Experimental Psychology: General, 143(2), 534-547.
Simonsohn, U., Simmons, J. P., & Nelson, L. (2021). Evidence of fraud in an influential field
experiment about dishonesty. Data Colada, 98. Retrieved from:
Singal, J. (2021). The quick fix: Why fad psychology can’t cure our social ills. New York, NY:
Farrar, Straus, and Giroux.
Sisk, V. F., Burgoyne, A. P., Sun, J., Butler, J. L., & Macnamara, B. N. (2018). To what extent
and under which circumstances are growth mind-sets important to academic
achievement? Two meta-analyses. Psychological Science, 29(4), 549-571.
Skitka, L. J. (2020). An optimistic take on avoiding liberal (and other sources of) bias.
Psychological Inquiry, 31, 88-89.
Sniderman, P. M., & Tetlock, P. E. (1986). Symbolic racism: Problems of motive attribution in
political analysis. Journal of Social Issues, 42(2), 129-150.
Spellman, B., Gilbert, E., & Corker, K. S. (2017). Open science: What, why, and how.
Steele, K. M. (2014). Failure to replicate the Mehta and Zhu (2009) color-priming effect on
anagram solution times. Psychonomic Bulletin & Review, 21(3), 771-776.
Stern, C., & Crawford, J. T. (2021). Ideological conflict and prejudice: An adversarial
collaboration examining correlates and ideological (a) symmetries. Social Psychological
and Personality Science, 12(1), 42-53.
Strevens, M. (2020). The knowledge machine: How irrationality created modern science. New
York: Liveright Publishing.
Stoet, G., & Geary, D. C. (2012). Can stereotype threat explain the gender gap in mathematics
performance and achievement?. Review of General psychology, 16(1), 93-102.
Stroud, N. J. (2008). Media use and political predispositions: Revisiting the concept of selective
exposure. Political Behavior, 30(3), 341-366.
Stroud, N. J. (2010). Polarization and partisan selective exposure. Journal of Communication,
60(3), 556-576.
Surowiecki, J. (2004). The wisdom of crowds: Why the many are smarter than the few and how
collective wisdom shapes business, economies, societies, and nations. London: Little,
Taber, C. S., & Lodge, M. (2006). Motivated skepticism in the evaluation of political beliefs.
American Journal of Political Science, 50(3), 755-769.
Tappin, B. M., Pennycook, G., & Rand, D. G. (2020). Bayesian or biased? Analytic thinking and
political belief updating. Cognition, 204, 104375.
Tetlock, P. E. (2003). Thinking the unthinkable: Sacred values and taboo cognitions. Trends in
Cognitive Sciences, 7(7), 320-324.
Tetlock, P.E. (2006). Adversarial collaboration: Least feasible when most needed? Least needed
when most feasible? Presentation to Board of Directors of Russell Sage Foundation, New
York City.
Tetlock, P. E. (2012). Rational versus irrational prejudices: How problematic is the ideological
lopsidedness of social psychology?. Perspectives on Psychological Science, 7(5), 519-
Tetlock, P. E. (2020). Gauging the politicization of research programs. Psychological Inquiry,
31(1), 8687.
Tetlock, P. E., Kristel, O. V., Elson, S. B., Green, M. C., & Lerner, J. S. (2000). The psychology
of the unthinkable: taboo trade-offs, forbidden base rates, and heretical
counterfactuals. Journal of Personality and Social Psychology, 78(5), 853-870.
Tetlock, P. E., & Levi, A. (1982). Attribution bias: On the inconclusiveness of the cognition-
motivation debate. Journal of Experimental Social Psychology, 18(1), 68-88.
Tetlock, P. E., & Manstead, A. S. (1985). Impression management versus intrapsychic
explanations in social psychology: A useful dichotomy?. Psychological Review, 92(1),
Tetlock, P. E., & Mitchell, G. (2009a). Implicit bias and accountability systems: What must
organizations do to prevent discrimination?. Research in Organizational Behavior, 29, 3-
Tetlock, P. E., & Mitchell, G. (2009b). Adversarial collaboration aborted but our offer still
stands. Research in Organizational Behavior, 29, 77-79.
Tetlock, P. E., & Mitchell, G. (2009c). A renewed appeal for adversarial collaboration. Research
in Organizational Behavior, (29), 71-72.
Thomas, A. G., Armstrong, S. L., Stewart-Williams, S., & Jones, B. C. (2021). Current fertility
status does not predict sociosexual attitudes and desires in normally ovulating
women. Evolutionary Psychology, 19(1), 1474704920976318.
Tomkins, A., Zhang, M., & Heavlin, W. D. (2017). Reviewer bias in single-versus double-blind
peer review. Proceedings of the National Academy of Sciences, 114, 12708-12713.
Tsou, A., Schickore, J., & Sugimoto, C. R. (2014). Unpublishable research: Examining and
organizing the ‘file drawer’. Learned Publishing, 27(4), 253-267.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and
biases. Science, 185(4157), 1124-1131.
Tversky, A., & Kahneman, D. (1981). Evidential impact of base rates. In D. Kahneman, P.
Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristics and biases (pp.
153160). New York, NY: Cambridge University Press.
Ulbig, S. G., & Funk, C. L. (1999). Conflict avoidance and political participation. Political
Behavior, 21(3), 265-282.
Van Bavel, J. J., Reinero, D. A., Harris, E., Robertson, C. E., & Pärnamets, P. (2020). Breaking
groupthink: Why scientific identity and norms mitigate ideological epistemology.
Psychological Inquiry, 31(1), 66-72.
Van Dessel, P., Gawronski, B., Smith, C. T., & De Houwer, J. (2017). Mechanisms underlying
approach-avoidance instruction effects on implicit evaluation: Results of a preregistered
adversarial collaboration. Journal of Experimental Social Psychology, 69, 23-32.
van Gelder, T., Kruger, A., Thomman, S., de Rozario, R., Silver, E., Saletta, M., ... & Burgman,
M. (2020). Improving Analytic Reasoning via Crowdsourcing and Structured Analytic
Techniques. Journal of Cognitive Engineering and Decision Making, 14(3), 195-217.
van Prooijen, J. W., & Krouwel, A. P. (2019). Psychological features of extreme political
ideologies. Current Directions in Psychological Science, 28(2), 159-163.
Vazire, S. (2018). Implications of the credibility revolution for productivity, creativity, and
progress. Perspectives on Psychological Science, 13(4), 411-417.
Vlasceanu, M., Morais, M. J., & Coman, A. (2021). The effect of prediction error on belief
update across the political spectrum. Psychological Science, 0956797621995208.
von Hippel, W., & Buss, D. M. (2017). Do ideologically driven scientific agendas impede the
understanding and acceptance of evolutionary principles in social psychology. In J. T.
Crawford & L. Jussim. Frontiers of Social Psychology Series: The Politics of Social
Psychology (pp. 725). New York, NY: Routledge.
Ward, A., English, T., & Chin, M. (2021). Physical attractiveness predicts endorsement of
specific evolutionary psychology principles. Plos one, 16(8), e0254725.
Wason, P. C., & Johnson-Laird, P. N. (1972). Psychology of reasoning: Structure and
content (Vol. 86). Cambridge, MA: Harvard University Press.
Weber, M. (1968). Economy and Society. New York: Bedminster.
Wicherts, J. M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of
psychological research data for reanalysis. American Psychologist, 61(7), 726.
Widemo, F., & Sæther, S. A. (1999). Beauty is in the eye of the beholder: Causes and
consequences of variation in mating preferences. Trends in Ecology & Evolution, 14(1),
Williams, W. M., & Ceci, S. J. (2015). National hiring experiments reveal 2: 1 faculty preference
for women on STEM tenure track. Proceedings of the National Academy of
Sciences, 112(17), 5360-5365.
Winegard, B. M., & Clark, C. J. (2020). Without contraries is no progression. Psychological
Inquiry, 31(1), 94-101.
Winegard, B. M., Clark, C. J., Hasty, C. R., & Baumeister, R. F. (2018). Equalitarianism: A
source of liberal bias. Manuscript submitted for publication.
Wiseman, R., & Schlitz, M. (1997). Experimenter effects and the remote detection of staring.
Journal of Parapsychology, 61, 197207.
Wiseman, R., & Schlitz, M. (1999). Replication of experimenter effect and the remote detection
of staring. Proceedings of the 42nd Annual Convention of the Parapsychological
Association, 471479.
Wright, J. D., Goldberg, Z., Cheung, I., & Esses, V. M. (2021). Clarifying the meaning of
symbolic racism. Unpublished manuscript.
Zmigrod, L., Rentfrow, P. J., & Robbins, T. W. (2020). The partisan mind: Is extreme political
partisanship related to cognitive inflexibility?. Journal of Experimental Psychology:
General, 149(3), 407-418.
... Although this technique failed to catch on, it nevertheless provides a potentially fruitful approach to address contentious scientific issues (see Kahneman, 2003Kahneman, , 2011. In a recent article, Clark et al. (2022) emphasize the value of this research practice as a way to expedite science and truth-seeking, by having scholars who disagree with each other's interpretation of data or theory work together to resolve their dispute. ...
... People's beliefs are shaped by both accuracy goals and social goals, and these two motivational sources can come into direct conflict (Van Bavel & Pereira, 2018;Van Bavel et al., 2020). When the stakes for believing accurate information are low, people often prioritize social goals such as conforming to a group's beliefs in order to fit in or rise in social status (Clark et al., 2022). In fact, even when the stakes for believing accurate information are high (e.g., getting vaccinated to reduce risk of serious illness or death during the coronavirus disease pandemic), people sometimes still prioritize partisan beliefs (e.g., many Republicans refuse to get vaccinated or deny its efficacy), or even promote conspiracy theories (Douglas, 2021). ...
... One of the main barriers to adversarial collaborations is the higher level of resources they require compared to an average research project (Clark et al., 2022). The increase in time investment, effort, and uncertainty (as well as the stress from social conflict) when conducting a study with adversarial collaborators might discourage scientists, especially those in early career stages, from embarking on such a project. ...
... The two teams then discuss and argue further, with the intention of making ensuing publications and recommendations as sound and close to objective as possible. The concept of such "adversarial collaboration" has been discussed also by Cowan et al. (2020) and Clark et al. (2021), and the proposal for a specifically Science Court has much the same rationale (Bauer, 2017, chapter 12). ...
... More generally, scientific activity has nowadays become so intensely competitive as to be dysfunctional in several respects. Finding the best interpretation, theory, or understanding is helped-from an objective standpoint-if differing claims and evidence engage directly and openly, as in the resort to Devils' Advocates or Blue-Team/Red-Team exercises (above; Koonin, 2021) or through "adversarial collaborations" (Cowan et al., 2020;Clark et al., 2021). But that sort of procedure calls for more patient consideration, less rush to publish, than is now commonplace; personalities that were ideal for doing science before, say, the middle of the 20 th century (Bauer, 2017, p. 17 ff.), would probably not find modern-day science a congenial vocation. ...
Full-text available
This book should be required reading for all scholars and students of Science and Technology Studies (STS), which encompasses the history and sociology of science and the interaction of science with society as a whole.1 Anomalists will find the discovery narrative engrossing and the whole book rewarding, well worth coping with the occasional technicalities. Lay readers should likewise appreciate Part 1 and will miss little of importance to them by scanning Part 2 more rapidly. Cosmic Rain is really several books in one. Most directly, it is a fascinating scientific detective story. At the same time, as Frank recognized (p. 4), it is an important case study in the history of science, illuminating most particularly the circumstances of scientific breakthroughs that are surprising and unforeseen. Frank’s experiences illustrate several general points about the manner in which science receives—or rather, resists—startling novelty. Furthermore, this book is a very detailed first-hand description of scientific activity, warts and all, that should enable non-scientists to begin to recognize that scientific activity is very much like other human activities: influenced by human behavior and human psychology, not only by the objective technical considerations. Louis Frank was a distinguished physicist at the University of Iowa whose specialty was plasma physics. In the early 1980s, he was puzzled by persistent dark spots in ultraviolet (UV) images of the outer reaches of the Earth taken from a satellite, the Dynamics Explorer, which carried several instruments that were Frank’s responsibility.
... The two teams then discuss and argue further, with the intention of making ensuing publications and recommendations as sound and close to objective as possible. The concept of such "adversarial collaboration" has been discussed also by Cowan et al. (2020) and Clark et al. (2021), and the BOOK REVIEW proposal for a specifically Science Court has much the same rationale (Bauer, 2017, chapter 12). ...
... More generally, scientific activity has nowadays become so intensely competitive as to be dysfunctional in several respects. Finding the best interpretation, theory, or understanding is helped-from an objective standpoint-if differing claims and evidence engage directly and openly, as in the resort to Devils' Advocates or Blue-Team/Red-Team exercises (above; Koonin, 2021) or through "adversarial collaborations" (Cowan et al., 2020;Clark et al., 2021). But that sort of procedure calls for more patient consideration, less rush to publish, than is now commonplace; personalities that were ideal for doing science before, say, the middle of the 20 th century (Bauer, 2017, p. 17 ff.), would probably not find modern-day science a congenial vocation. ...
... In the real world, people must balance various tradeoffs by weighing the probabilities and magnitudes of different risks and rewards (Clark et al., 2022). A demonstration of bravery might increase one's value to the social group, but also puts one at risk of harm. ...
Full-text available
Two studies (total n = 1,245) explored the influence of (1) receiving public vs. private performance feedback, (2) competing on a team vs. solo, and (3) individual differences in team competition participation on cheating behavior. Participants were given opportunities to cheat in an online trivia competition and self-reported their cheating behavior. Meta-analyses of Studies 1 and 2 revealed that participants who believed their performance feedback would be public cheated more than those who believed their performance feedback would be private, and individuals who regularly participate in team competition cheated more than those who do not. We found no evidence that experimentally manipulating team competition (vs. solo competition) influenced cheating. Our findings suggest that people will put their moral reputations at risk in order to protect their competence reputations by engaging in unethical behavior that signals (false) competence to others.
... As Ceci and Williams (2022) pointed out, many accurate interpretations of findings are still misleading because scholars are free to highlight and ignore different parts of the same information in their framing of the 8 findings. Misleading-but-not-technically-inaccurate interpretations of findings are not uncommon (for discussions of examples see Blanton et al., 2009;Clark et al., 2022;Clark & Tetlock, 2021;Clark & Winegard, 2020;Dawson & Arkes, 2009;Purser & Harper, 2020;Sniderman & Tetlock, 1986;Wright et al., 2021), but these are not widely regarded as QRPs, nor would we expect many scholars to detect this tendency in themselves. Whereas Open Science practices can constrain QRPs that are easily detected with increased transparency, such as unplanned data exclusions and abuses of analyst degrees of freedom, ACs can help constrain subtler practices such as refusals to run certain tests, rigging methods, file drawering, and tendentious framing of conclusions. ...
Full-text available
Our target article proposed that normalizing adversarial collaborations (ACs) will catalyze progress in the behavioral sciences (Clark et al., 2022). ACs require scholars to state their own positions precisely, address the real (not caricatured) version of their opponents’ claims, and work with their adversary to design studies that all parties agree constitute fair tests (rather than carefully crafting studies likely to confirm their preferred hypotheses). We welcome this opportunity to respond to seven commentaries by distinguished scholars, who mostly agreed that ACs are a good idea in principle but highlighted the practical difficulties of changing norms. They also provided numerous recommendations for how to change norms in the behavioral sciences and better incentivize ACs. We can respond to only a fraction of the many insightful points made in these commentaries, but we encourage curious scholars to read all of them. Below, we identify themes running through the discussions—and our grounds for optimism that, although ACs are challenging, the tipping point may be closer than we think given the likely benefits from ACs.
Full-text available
Full-text available
Evolutionary psychology has emerged as a controversial discipline, particularly with regard to its claims concerning the biological basis of sex differences in human mate preferences. Drawing on theories of motivated inference, we hypothesized that those who are most likely to be privileged by specific aspects of the theory would be most likely to support the theory. In particular, we predicted that physical attractiveness would be positively associated with endorsement of predictions of evolutionary psychology concerning mating strategies. Two studies confirmed this hypothesis. In Study 1, participants rated as higher in physical attractiveness were more likely to support specific principles of evolutionary psychology. In Study 2, a manipulation designed to boost self-perceived physical attractiveness increased endorsement of those same principles. Observer-rated physical attractiveness generally predicted individuals’ support of the theoretical principles better than did gender, political orientation, or self-esteem. Results suggest that those most likely to benefit according to certain predictions of evolutionary psychology are also those most likely to be sympathetic toward its relevant principles.
Full-text available
In 2018, Silberzahn, Uhlmann, Nosek, and colleagues published an article in which 29 teams analyzed the same research question with the same data: Are soccer referees more likely to give red cards to players with dark skin tone than light skin tone? The results obtained by the teams differed extensively. Many concluded from this widely noted exercise that the social sciences are not rigorous enough to provide definitive answers. In this article, we investigate why results diverged so much. We argue that the main reason was an unclear research question: Teams differed in their interpretation of the research question and therefore used diverse research designs and model specifications. We show by reanalyzing the data that with a clear research question, a precise definition of the parameter of interest, and theory-guided causal reasoning, results vary only within a narrow range. The broad conclusion of our reanalysis is that social science research needs to be more precise in its “estimands” to become credible.
Full-text available
In this crowdsourced initiative, independent analysts used the same dataset to test two hypotheses regarding the effects of scientists' gender and professional status on verbosity during group meetings. Not only the analytic approach but also the operationalizations of key variables were left unconstrained and up to individual analysts. For instance, analysts could choose to operationalize status as job title, institutional ranking, citation counts, or some combination. To maximize transparency regarding the process by which analytic choices are made, the analysts used a platform we developed called DataExplained to justify both preferred and rejected analytic paths in real time. Analyses lacking sufficient detail, reproducible code, or with statistical errors were excluded, resulting in 29 analyses in the final sample. Researchers reported radically different analyses and dispersed empirical outcomes, in a number of cases obtaining significant effects in opposite directions for the same research question. A Boba multiverse analysis demonstrates that decisions about how to operationalize variables explain variability in outcomes above and beyond statistical choices (e.g., covariates). Subjective researcher decisions play a critical role in driving the reported empirical results, underscoring the need for open data, systematic robustness checks, and transparency regarding both analytic paths taken and not taken. Implications for organizations and leaders, whose decision making relies in part on scientific findings, consulting reports, and internal analyses by data scientists, are discussed.
Full-text available
The social and behavioral sciences have taken a substantial reputational hit over the past decade. Some highly publicized findings have failed to replicate—and those that do replicate often do so with much smaller effect sizes (Camerer et al., 2018; Nosek et al., 2021). Plus some highly touted “science-based” interventions have failed to produce promised positive social change—even when massive efforts are dedicated to making them work (Singal, 2021). This chapter will lay out our two-tiered hypothesis: (a) the ideological homogeneity of the social sciences has entrenched certain scientific orthodoxies and taboos; (b) these orthodoxies and taboos have protected weak ideas from rigorous scrutiny and contributed to the replication crisis. We also explain how open science practices, although a big step in the right direction, leave many researcher degrees of freedom on the table that can bias methodological decisions and research conclusions. We argue that adversarial collaborations are the next necessary science reform for addressing lingering weaknesses in social scientific norms and can further minimize false positives, expedite scientific corrections, stimulate progress for stalemated scientific debates, and ultimately improve the quality of social scientific outputs.
Full-text available
Championing open science, an adversarial collaboration aims to unravel the footprints of consciousness
Full-text available
Previous research has found that women at peak fertility show greater interest in extra-pair sex. However, recent replications have failed to detect this effect. In this study, we add to this ongoing debate by testing whether sociosexuality (the willingness to have sex in the absence of commitment) is higher in women who are at peak fertility. A sample of normally ovulating women ( N = 773) completed a measure of sociosexuality and had their current fertility status estimated using the backward counting method. Contrary to our hypothesis, current fertility was unrelated to sociosexual attitudes and desires, even when relationship status was included as a moderator. These findings raise further doubts about the association between fertility and desire for extra-pair sex.
Replication—an important, uncommon, and misunderstood practice—is gaining appreciation in psychology. Achieving replicability is important for making research progress. If findings are not replicable, then prediction and theory development are stifled. If findings are replicable, then interrogation of their meaning and validity can advance knowledge. Assessing replicability can be productive for generating and testing hypotheses by actively confronting current understandings to identify weaknesses and spur innovation. For psychology, the 2010s might be characterized as a decade of active confrontation. Systematic and multi-site replication projects assessed current understandings and observed surprising failures to replicate many published findings. Replication efforts highlighted sociocultural challenges such as disincentives to conduct replications and a tendency to frame replication as a personal attack rather than a healthy scientific practice, and they raised awareness that replication contributes to self-correction. Nevertheless, innovation in doing and understanding replication and its cousins, reproducibility and robustness, has positioned psychology to improve research practices and accelerate progress. Expected final online publication date for the Annual Review of Psychology, Volume 73 is January 2022. Please see for revised estimates.
Making predictions is an adaptive feature of the cognitive system, as prediction errors are used to adjust the knowledge they stemmed from. Here, we investigated the effect of prediction errors on belief update in an ideological context. In Study 1, 704 Cloud Research participants first evaluated a set of beliefs and then either made predictions about evidence associated with the beliefs and received feedback or were just presented with the evidence. Finally, they reevaluated the initial beliefs. Study 2, which involved a U.S. Census–matched sample of 1,073 Cloud Research participants, was a replication of Study 1. We found that the size of prediction errors linearly predicts belief update and that making large errors leads to more belief update than does not engaging in prediction. Importantly, the effects held for both Democrats and Republicans across all belief types (Democratic, Republican, neutral). We discuss these findings in the context of the misinformation epidemic.
In recent years, an upsurge of polarization has been a salient feature of political discourse in America. A small but growing body of research has examined the potential relevance of intellectual humility (IH) to political polarization. In the present investigation, we extend this work to political myside bias, testing the hypothesis that IH is associated with less bias in two community samples ( N 1 = 498; N 2 = 477). In line with our expectations, measures of IH were negatively correlated with political myside bias across paradigms, political topics, and samples. These relations were robust to controlling for humility. We also examined ideological asymmetries in the relations between IH and political myside bias, finding that IH–bias relations were statistically equivalent in members of the political left and right. Notwithstanding important limitations and caveats, these data establish IH as one of a small handful psychological features known to predict less political myside bias.