PreprintPDF Available
Preprints and early-stage research may not have been peer reviewed yet.

Abstract

In this chapter, we provide an overview of various topics related to open science, drawing often (and necessarily) on work outside of applied linguistics. Here, we define open science more broadly. Rather than limiting open science to the question of whether a study and its data are open access or behind a paywall, this chapter defines open science more generally as transparency in all aspects of the research process (see also Gass et al., 2021). From this perspective, the most relevant discipline dealing with these issues is metascience (or meta-research).
1
Open Science in Applied Linguistics: An Introduction to Metascience
Ali H. Al-Hoorie, Royal Commission for Jubail and Yanbu, Saudi Arabia
Phil Hiver, Florida State University, USA
Cite as
Al-Hoorie, A. H., & Hiver, P. (in press). Open science in applied linguistics: An introduction to
metascience. In Plonsky, L. (Ed.), Open science in applied linguistics. John Benjamins
2
Open Science in Applied Linguistics: An Introduction to Metascience
Ali H. Al-Hoorie
Phil Hiver
I must make the painful disclosure that in the past few years, in numerous
experiments, I have failed to replicate some of the widely accepted
findings enshrined in psychology textbooks and have also learned of
several other investigators with similar experiences. (Wong, 1981, p. 690)
Science is commonly viewed as a self-correcting enterprise. For instance, in his classic book The
Sociology of Science, Robert Merton proclaimed, “scientific research is under the exacting
scrutiny of fellow experts…. the activities of scientists are subject to rigorous policing, to a
degree perhaps unparalleled in any other field of activity” (Merton, 1973, p. 276). The last
decade has largely revealed a very different picture, however. In sister fields such as psychology,
along with high-profile fraud cases and retractions, some highly unlikely results have found their
way into major peer-reviewed journals, including highly suspect findings related to topics like
precognition (Bem, 2011), power posing (Carney et al., 2010), and positivity as a cure for cancer
(see Coyne & Tennen, 2010; Coyne et al., 2010). Replication failures have also plagued and
shaken fields as diverse as biology, genetics, and medicine (Ioannidis, 2005; Schooler, 2014). All
of this has led commentators to question the ability of science to be self-correcting by default
without formally instituting safeguards to ensure that the self-correction process actually takes
place (Freese & Peterson, 2018; Giner-Sorolla, 2012; Ioannidis, 2012). As some proponents of
3
open science have argued, “Beyond the published reports, science operates as a ‘trust me’ model
that would be seen as laughably quaint for ensuring responsibility and accountability in state or
corporate governance” (Nosek et al., 2012, p. 625). In applied linguistics, too, observers have
noted the generally slow uptake of such open science practices (Marsden & Plonsky, 2018).
In this chapter, we provide an overview of various topics related to open science, drawing
often (and necessarily) on work outside of applied linguistics. Here, we define open science more
broadly. Rather than limiting open science to the question of whether a study and its data are
open access or behind a paywall, this chapter defines open science more generally as
transparency in all aspects of the research process (see also Gass et al., 2021). From this
perspective, the most relevant discipline dealing with these issues is metascience (or meta-
research). Following the tradition of metascience (e.g., Ioannidis et al., 2015; Munafò et al.,
2017; Sterling & Plonsky, under contract), this chapter provides an overview of openness and
transparency in relation to the following five areas:
1) Methods: performing research openly
2) Reporting: communicating research openly
3) Reproducibility: verifying research openly
4) Evaluation: assessing the rigor of research openly
5) Incentives: rewarding open research
Methods: Performing Research Openly
A major deterrent to open science is the lack of transparency regarding how research is
conducted. Indeed, there is wide agreement that a prime culprit threatening the validity of
4
research findings is questionable research practices. These practices do not include the “big
three” of fabrication, falsification and plagiarism (these would be outright scientific misconduct).
Instead, these practices can be as innocent-looking as conflating planned and unplanned
analyses. One manifestation of this conflation is p-hacking, such as data peeking and optional
stopping, deciding whether to exclude data after completing the analyses, and performing
multiple analyses on a single datasetall while disclosing only those results supporting a desired
outcome. A second manifestation is hypothesizing after the results are known (HARKing; Kerr,
1998), in that the author makes up a sensible interpretation of these results, concocts a theory that
is nicely supported by these unanticipated results, and then revamps the literature review to craft
a persuasive narrative under the pretense that these results were what the researcher had
predicted all alongagain without disclosing to readers what actually happened behind the
scenes. Third, researchers might conduct several studies but then report only those supporting a
certain finding, preventing other “unruly” findings from reaching future meta-analyses and thus
giving the illusion that a preferred theory has more supporting evidence than it actually does.
These softer forms of fraud resulting from increased researcher degrees of freedom,
consequently, inflate the rate of false positives fieldwide and otherwise introduce error and bias
into the published record.
1
For some readers, the above practices may look too shady for anyone to knowingly
engage in them. However, research has documented a surprisingly high number of researchers
acknowledging that they have engaged in them. For example, in a worrying survey involving
over 2,000 psychologists, John et al. (2012) estimated that the percentage of researchers who had
1
In June 2020, the Dutch National Body of Scientific Integrity ruled for the first time that these practices are
violations of the code of conduct for scientific integrity, and no longer just “questionable” practices. See
https://daniellakens.blogspot.com/2020/09/p-hacking-and-optional-stopping-have.html.
5
engaged in questionable research practices could reach 100%, with about 1 in 10 respondents
introducing false data. A similar, unfortunate picture appears in applied linguistics. In a survey of
about 350 applied linguists, Isbell et al. (under review) found that almost 95% admitted to
committing questionable research practices, with 1 in 6 readily acknowledging scientific
misconduct. To be clear, hardly anyone is suggesting intentional malfeasance on the part of
researchers. Researchers are not deliberately trying to produce bad science. Instead, most
researchers seem to believe that such practices are acceptable, a process exacerbated by
motivated reasoning (Kunda, 1987) where self-serving biases make the individual not the best
judge of their own behavior. One can almost imagine the internal monologue inside the
researcher’s heads when justifying, for example, the removal of outliers or the omission of non-
significant results. Ultimately, collective engagement in such questionable research practices
“can lead to a ‘race to the bottom,’ with questionable research begetting even more questionable
research” (John et al., 2012, p. 531).
It is clear that transparency in how research is conducted is essential to address the
prevalence of questionable research practices. This message is front and center in our discipline-
specific methodological reform movement (Marsden & Plonsky, 2018). Two solutions have
recently gained wide acceptance in this regard, namely preregistration (see Huensch, this
volume) and multi-lab collaboration (Moranski & Ziegler, 2021). Preregistration involves
declaring the various decisions the researcher is planning to make before they start collecting any
data. Examples of these decisions include the sample size and any stopping rules, variables
collected including any demographics, how outliers and missing data will be handled, and how
the data will be combined and analyzed. This preregistration takes place at an online platform
(e.g., the Open Science Framework, www.osf.io), which then produces a frozen, time-stamped
6
copy of these preregistration protocols distinguishing them from other exploratory and ad hoc
analyses (e.g., Nosek et al., 2019; Nosek et al., 2018). Recent examples of empirical studies
adopting preregistration include Hiver and Al-Hoorie (2020a) and Huensch and Nagle (2021).
A subsequent development is registered reports (e.g., Marsden, Morgan-Short,
Trofimovich, et al., 2018). In registered reports, not only do researchers preregister all their
research protocols before any data collection, but they also submit them to a journal that adopts
registered reports (e.g., Language Learning in our field). After Stage 1 review, the journal
confers an accept-in-principle decision, meaning that publication of this article is guaranteed if
the researchers adhere to the preregistered protocols no matter what the results are. There is some
initial evidence showing the improved quality of registered reports in comparison to the standard
publication model (Soderberg et al., 2021). Nevertheless, some have noted that even this system
is not foolproof and can be “cracked”, as researchers may game the system through pre-
registering after the results are known (PARKing; Yamada, 2018). Obviously, this can be seen as
a clear case of fraud. Other commentators (Hollenbeck & Wright, 2017) have suggested
encouraging exploratory research via distinguishing between transparent HARKing
(THARKing) and secret HARKing (SHARKing). Finally, in recognition that preregistered
protocols inevitably involve adhering to an arbitrary set of choices among several other
reasonable options, Steegen et al. (2016) proposed a multiverse analysis procedure, where
researchers analyze their data multiple times each following an alternative but reasonable
scenario. The aggregate findings of this procedure would show whether the results are robust or
fragile in supporting a given claim.
The second solution enhancing transparency of research methodology is multi-lab
collaborations. According to this approach, researchers no longer collect data and publish their
7
results independently. Instead, data collection is crowdsourced from different labs who then
combine their data and publish it all in one report. For example, the journal Collabra:
Psychology has recently introduced a section called “Nexus” dedicated to crowdsourced research
(McCarthy & Chartier, 2017). Inclusion in this collection is not determined by substantive topic
as is typically the case in special issues, but by adopting a multi-lab approach to address an issue.
Greater adoption of multi-lab research has several fieldwide advantages. First, when labs
collaborate, this increases statistical power and consequently lends more precision to estimating
effect sizes. Second, this approach helps combat publication bias since such large-scale projects
are unlikely to end up in the file drawer, for example when the results are non-significant. Third,
collective research helps create a meta-analytic mindset (see Norris & Ortega, 2006) as
researchers no longer try to interpret standalone results from individual studies without reference
to the overall, cumulative trend. Finally, this approach lowers the pressure on researchers to
engage in questionable research practices because the chances of successfully publishing multi-
lab projects are higher and because methods and analyses are more open than in the case of a
single lab.
Reporting: Communicating Research Openly
Some observers argue that questionable research practices are a natural outcome of questionable
publication practices (Coyne, 2016; see also Wong, 1981). From this perspective, “scientists as
economic actors [are] led to bad practices by a poorly aligned system of incentives” (Freese &
Peterson, 2018, p. 293). The standard publishing model prioritizes manuscripts making broad
generalizability claims with perfect-looking results (Giner-Sorolla, 2012; Simons et al., 2017).
As an illustration, the editor of the prestigious journal Communication Monographs explicitly
8
states that the journal prefers “great” over merely “good” scholarship: “good research produces
new findings, whereas great research produces newsworthy findings” (Schrodt, 2020, p. 1,
original italics). The downside of this attitude is that it creates an incentive structure that favors
innovation over verification (see also Porte, 2013). Moreover, in order to achieve the
flawlessness or broad appeal top journals require, some researchers may be tempted to jack up
their manuscripts by engaging in questionable research practices, which have been described as
“the steroids of scientific competition, artificially enhancing performance and producing a kind
of arms race in which researchers who strictly play by the rules are at a competitive
disadvantage” (John et al., 2012, p. 524). Worryingly, this effect of publication pressure has even
been observed in medical research (Tijdink et al., 2014).
One of the primary factors behind this level of journal selectivity is the never-ending
pursuit of higher impact factors. Briefly, an impact factor is a numeric value summarizing the
number of citations (recent) papers of a journal have received (see Al-Hoorie & Vitta, 2019).
Variation among impact factors is typically related to how each is calculated, such as whether
self-citations are excluded and whether the impact factor of the citing journal is additionally
taken into account. The notion of impact factor has taken academic publishing by storm; most
journals are keen on being indexed and on receiving a higher impact factor year after year in
order to announce it proudly on their homepages. Additionally, the fact that some journal editors
use every trick in the book to game the system and manipulate their journal impact factors is
“well documented” (Brumback, 2009, p. 260). Some analysts argue that a single numeric value
should not and cannot represent the scientific rigor of the journal overall or any individual article
in it (see Agrawal, 2005; Falagas & Alexiou, 2008; Ioannidis & Thombs, 2019). The
fundamental logical flaw in the whole notion of journal impact factors is that they are used to
9
judge the quality of the journal, not individual articles, and then this judgement is transported to
future articles to be published in that journal. This is an illustration of the ecological fallacy (see
Hiver & Al-Hoorie, 2020b) where conclusions made at the aggregate level are then carried to the
individual article levelpast and future. Nevertheless, journal impact factors remain an
important criterion to make critical and life-changing decisions related to employment,
promotion, tenure, and funding in many parts of the world. To some observers, the pursuit of
higher impact factors is analogous to “worshiping false idols” which can “threaten to destroy
scientific inquiry as we know it” (Brumback, 2008, p. 365).
2
Partly in response to the selectivity of elite journals and their various restrictions (e.g.,
word count limits, maximum number of papers accepted, high fees passed on to subscribers and
readers), and aided by the availability of internet access, a growing number of journals have
started to adopt a (gold) open access business model. According to this business model, the
equation is flipped so that the authorsnot the readerspay what is euphemistically called
article processing charges. These charges can range anywhere from a few hundred dollars to
several thousand. Journals adopting this business model suffer from a chronic conflict of interest
in that rejecting (poor) manuscripts leads to loss of revenue (Beall, 2017). In contrast to this gold
access model, a growing number of journals have started to adopt a diamond (or platinum)
access model, where neither authors nor readers pay any charges (for a list of applied linguistics
open access journals, see www.ali-alhoorie.com/applied-linguistics-open-access-journals).
Journals adopting this model are usually sponsored by universities and learned societies. This
approach seems to be the fairest considering that many gold open access journals are for-profit
2
At the time of writing, Utrecht University in the Netherlands announced that all of its departments will abandon the
impact factor in all hiring and promotion decisions in 2022. This decision is partly inspired by the Declaration on
Research Assessment (DORA; www.sfdora.org). See https://www.nature.com/articles/d41586-021-01759-5.
10
organizations that pocket the money rather than paying their editors and reviewers. The cost of
setting up a diamond access journal can be substantially minimized by adopting an overlay
approach (Smith, 2000). Here, the journal does not even host a proprietary electronic version of
the article on its website. Instead, the article is first hosted at a preprint repository (e.g.,
arXiv.org) and, after the journal conducts peer review to establish its quality, the journal simply
adds to its website the link to that article. This way, an article could potentially be part of
multiple overlay journals if it is of interest to them. (For further discussion on the role of
journals, journal editors, and learned societies in fostering an ethic of open science, see Silver,
this volume, and Brysbaert, this volume.)
Discussion of open access options cannot be complete without mention of predatory
publishing. Predatory journals and publishers are defined as “entities that prioritize self-interest
at the expense of scholarship and are characterized by false or misleading information, deviation
from best editorial and publication practices, a lack of transparency, and/or the use of aggressive
and indiscriminate solicitation practices” (Grudniewicz et al., 2019, p. 211). Predatory publishers
follow an exploitative formula where authors are misled into buying their way into publication
without undergoing a genuine peer review process or any other quality assurance measure. These
journals offer a publication opportunity to authors who are desperate to publish their work
speedily or who are frustrated by repeated rejection of their work (Beall, 2017; Roberts, 2016).
These publishers typically adopt aggressive Nigerian-Prince-like solicitation campaigns to lure
their victims. According to one longitudinal study (Shen & Björk, 2015), there are about 8,000
active predatory journals whose publication volume increased from 53,000 articles in 2010 to
420,000 in 2014, with an estimated market size of $74 million annually. Clearly, this volume of
funding could have been utilized in better ways by researchers. To combat this waste in research
11
funding, the National Institutes of Health issued a statement in 2017 warning against publishing
funded research in predatory journals.
3
In 2019, a federal judge also ruled in favor of a Federal
Trade Commission lawsuit against predatory academic publisher OMICS Group, holding the
group liable for over $50 million in damages.
4
It goes without saying that avoiding predatory publishers requires first identifying them,
but this has turned out to be a tricky business. Predatory journals have jumped on the open access
bandwagon, preying on the sentiments of its advocates. Recent analyses show that there are
hundreds of potentially predatory journals indexed in the widely used Scopus database
(Macháček & Srholec, 2021; Marina & Sterligov, 2021).
5
Apparently, these journals were able to
hack their way into this indexing service whose primary job is to be an assurance measure of the
quality of its journals. This is an illustration of how predatory publishers are “corrupting” the
gold open access model, in the same way as that spam has exploited email communication
(Beall, 2012). Describing these journals as the Salon des Refuses, Beall (2017) stated that
“predatory publishers pose the biggest threat to science since the Inquisition” (p. 276).
Reproducibility: Verifying Research Openly
Perhaps the clearest indicator of the need to embrace open science is the status of reproducibility
(or lack thereof) across different disciplines (see, e.g., Sönning & Werner, 2021, specific for
linguistics). Reproducibility issues can be mitigated through open science practices such as
3
See https://grants.nih.gov/grants/guide/notice-files/not-od-18-011.html.
4
See https://www.ftc.gov/news-events/press-releases/2019/04/court-rules-ftcs-favor-against-predatory-academic-
publisher-omics.
5
It is worth noting that, during the preparation of this chapter, the article by Macháček and Srholec (2021) was
retracted by Springer Nature at the request of Frontiersas this paper considered Frontiers a predatory publishera
decision that the authors have contested. See https://retractionwatch.com/2021/09/07/authors-object-after-springer-
nature-journal-cedes-to-publisher-frontiers-demand-for-retraction/.
12
preregistration, complete and transparent reporting, and sharing of instruments, data and code
such as on the IRIS repository (www.iris-database.org; Marsden et al., 2016). There are different
facets of reproducibility and corresponding terms (e.g., Freese & Peterson, 2017; Nosek &
Errington, 2020). For example, computational reproducibility refers to the ability to repeat an
analysis on the same data (e.g., rerunning the analysis code) to verify the accuracy of the
findings reported. Robustness, on the other hand, refers to whether the same findings are
obtained when different analyses are performed on the same data (e.g., by excluding potential
outliers or using more advanced techniques). Obviously, all of this is hardly possible to perform
without making data available, which in turn requires standards for how data is prepared/cleaned,
managed, cited, annotated, and preserved for long-term accessall of which entail a cultural
shift in the field through education, outreach, and policy and technology development (Berez-
Kroeker et al., 2018; see also incentives below).
Replication can also refer to different aspects of the research process (Marsden, Morgan-
Short, Thompson, et al., 2018). It might refer to the repeatability of a study, in the sense that
sufficient description of the initial study
6
is available to permit its replication. It can also refer to
the findings themselves, and whether they are consistent with those from an initial study. When
it refers specifically to whether the results are consistent with previous results, replication has
also generated a confusing amount of terminology that further varies from one discipline to
another (Clemens, 2017; Plesser, 2018). Examples include identical replication, quasi-
replication, constructive replication, operational replication, statistical replication, theoretical
replication, and many more (for a summary, see Clemens, 2017, Table 2). In applied linguistics,
researchers have generally settled on three types of replication research: direct, partial/close,
6
Following recommendations by Marsden et al. (2018), we use the term “initial study” to describe the to-be-
replicated study, rather than “original study”.
13
approximate, and conceptual (see Marsden, Morgan-Short, Thompson, et al., 2018; Porte, 2012;
Porte & McManus, 2019). Direct replication requires adherence to the design of the initial study
as closely as possible. Partial/close replication involves an intentional alteration of one aspect
(e.g., participant age, intervention duration) to examine whether the results still hold, while
approximate replication introduces two changes. In a conceptual replication, researchers test the
underlying theory using a different method to test a hypothesized mechanism or examine
whether the initial results are a methodological artifact. (See Al-Hoorie et al. [under review] for a
critique of this conventional view of replication from a broader epistemological perspective.)
An important question when it comes to performing replication research is how to judge
the “success” of the replication. Intuitively, replication success is achieved when the replication
results are consistent with the initial results, but it turns out that this is not as straightforward as it
might seem. Criteria that have been used to evaluate replication success include p-value
significance, overlap in confidence intervals, direction of effect sizes, meta-analytic aggregation,
and human subjective judgment (Nelson et al., 2018; Open Science Collaboration, 2015;
Valentine et al., 2011). Ideally, assessment of replication success should draw from multiple
measures of these, in addition to differentiating adequately-powered from under-powered studies
(see Simonsohn, 2015, for one approach). Ideally, also, a finding should not be judged with a
single replication attempt. This is because of what Spence and Stanley (2016) call the replication
interval. Simulation research by Spence and Stanley showed that, depending on factors such as
sample size and reliability, the replication interval can be surprisingly wide for individual
replication studies making them hardly interpretable. For example, a replication of a correlation
of .46 can, mathematically, be .03 without falsifying the initial .46 result. This raises important
14
questions about the feasibility of individual replication attempts and their interpretability. To
quote Stanley and Spence (2014),
What if Jane had observed a correlation of r = .46 in her first study attempt? A colleague
from the same university, Richard, might attempt to replicate Jane’s finding under an
ideal replication scenario (i.e., using the same participants). On the basis of the above
results, Richard could conceivably observe a correlation of r = .03. The low correlation
obtained by Richard might lead him to wonder if Jane engaged in unethical research
practices or made a mistakeespecially given that he and Jane used the same
participants yet obtained a substantially different result. However, in reality, the only
reason Richard’s findings differed from Jane’s was the random measurement error
associated with the criterion. (p. 310)
Although replication is intended to serve noble scientific aims, there also are social
factors complicating the process. That is, attempting to replicate other people’s research has
turned out to be fraught with controversy (Hiver & Al-Hoorie, 2020a). Replicating another
researcher’s findings, as a result, is a sensitive issue. Replicators have been labeled as bullies
(Bohannon, 2014) charged with engaging in methodological intimidation (Fiske, 2016)
7
since the
interpretation of a replication attempt can range from “a legitimate disagreement over the best
methods (science), to signaling incompetence and fraud (pseudoscience)” (Clemens, 2017, p.
326). Along these lines, research by Fetterman and Sassenberg (2015) has shown that researchers
7
Originally, in a draft of this presidential column, Fiske described some types of replication as “methodological
terrorism”. After a backlash, however, she softened her tone and described it as “methodological intimidation”
instead. See http://datacolada.org/wp-content/uploads/2016/09/Fiske-presidential-guest-column_APS-
Observer_copy-edited.pdf.
15
overestimate the reputational damage caused by a failed replication. Another social factor
complicating the mainstream acceptance of replication research is the long-held editorial
(Neuliep & Crandall, 1990) and reviewer (Neuliep & Crandall, 1993) bias against replication,
particularly those with null results. Just as Marsden, Morgan-Short, Thompson, et al. (2018)
pointed out, for a replication attempt to achieve its intended effect, it must be self-labeled as such
in order to facilitate cross-study comparison by editors, reviewers, and future readers. However,
“replication” seems to have become a loaded label devaluing the research it is used to describe.
One proposal to address this problem has come to be known as the Pottery Barn Rule
(Srivastava, 2012), according to which a journal that publishes a paper has an obligation to
publish a replication of it regardless of the results.
8
Evaluation: Assessing the Rigor of Research Openly
Peer review has long been an essential tool to judge the quality of scholarly research and its
suitability for publication. At face value, the process looks straightforward. The journal editor
selects two or more reviewers who give feedback on a manuscript. In reality, however, the peer
review is anything but open and transparent. Authors submit their manuscripts and wait for
months on end for a (final) decision with little procedural recourse or ability to appeal. The
standard review process is fraught with ad hoc decisions that can threaten the validity of the
review outcome. Consider this description of what happens after a manuscript is submitted:
8
One initiative for researchers interested in open and reproducible research is ReproducibiliTea journal clubs
(www.ReproducibiliTea.org), which is a grassroots initiative established in 2018 at the University of Oxford. This
initiative helps researchers create local open science journal clubs at their institutions (in any field). At the time of
writing, there are clubs at 133 institutions from 25 different countries.
16
reviewers are selected based on ad hoc decision making by the editorrelevance to
content area, known to be reliable or high-quality reviewers, or people she has not asked
recently. Potential reviewers accept or decline invitations based on their interest in the
article, indebtedness to the editor, and availability. Reviews are typically anonymous, are
almost always completed without compensation, and can be as short as a few sentences
or longer than the report itself. The norm is a few paragraphs summarizing the main
issues and a few follow-up comments regarding minor questions or concerns. (Nosek &
Bar-Anan, 2012, p. 218)
It is clear that this approach is anything but systematic. In one large-scale multilevel meta-
analysis, Bornmann et al. (2010) found that the mean inter-rater reliability across reviewers was
a meagre Cohen’s Kappa of .17. In Brian Nosek’s words, “If researchers wrote up the peer
review process as a measure and submitted it for a journal for publication, it would get rejected
because of its unreliability.” It comes as no surprise, then, that reviewers sometimes complain
that authors tend to refuse taking up feedback and instead resubmit rejected manuscripts to other
journals with minimal change. Nor should it be surprising that this peer review takes a toll on
authors, stimulating a range of negative emotions. According to a text mining study by Jiang
(2021), rejection provokes disagreement, sadness and anger, while low-quality reviews similarly
lead to anxiety, sadness, and anger.
One solution to reforming peer review is to embrace openness and transparency. Again,
however, there are complexities in opening up peer review. Several models to open peer review
have been proposed. One such approach is open peer review identity, where peer reviewers are
not anonymous (e.g., Meta-Psychology). Naturally, deanonymizing peer reviewers could lead to
17
retaliation, and so reviewerswho are usually unpaidmight be less willing to take part in this
process (van Rooyen et al., 2010). Some journals (e.g., Frontiers) publish the names of reviewers
on the first page of each accepted manuscript. For some, however, this practice is tantamount to
a life-long commitment on the part of the reviewer to the quality of the manuscript, something
many reviewers might be reluctant to acceptand certainly not free of charge on behalf of a for-
profit organization. A perhaps more palatable alternative is opening up the actual review but not
the reviewer’s identity. In this model, the whole review history, including author replies and
editor recommendations, are posted along with the article. This applies to accepted articles,
though eLifewhich requires all manuscripts to be available on a preprint repository before
undergoing peer reviewhas also announced that it would adopt open review even for rejected
articles (Eisen et al., 2020). Another open review model is the post-publication peer review.
PeerPub (www.PeerPub.com), one popular platform of post-publication review, allows readers to
post anonymous comments on published articles. The anonymity of submitted comments has
proved controversial because of the possibility of rogue actors settling debts and defamation by
adversaries. However, PeerPub founders argued that they have strict guidelines to mitigate this
possibility, such as basing reviews on verifiable facts and maintaining academic decorum
(Barbour & Stell, 2020). Another innovative approach to open peer review is for the journal to
completely outsource the review to Peer Community In Registered Reports (PCI RR). PCI
(www.PeerCommunityIn.org) is a platform that allows researchers to form communities to
review and recommend preprints free of charge. Dubbed Registered Reports 2.0 (Eder & Frings,
2021), these preprints are now eligible for publication at a growing number of journals that have
recently agreed to commit to publishing them upon receiving favorable PCI RR reviews
without any further review by the journal itself (O’Grady, 2021). Other noteworthy attempts to
18
reform peer review and make it more transparent include setting empirical standards for
reviewers to base their reviews on (Ralph et al., 2021), introducing a bill of rights for manuscript
submitters (Clair, 2015), and adopting a transparent process for handling complaints against the
journal editors, staff, and publisher (COPE: Complaints and Appeals Focus, 2018).
Occasionally, the ramping up of the review process, including post-publication, may
require setting the record straight and correcting a publication, or even retracting it. Despite
increasing calls to normalize it, retraction is still generally considered a dark spot that few
researchers wish to have on their CV. This might have contributed to the fact that misconduct
investigations and retraction decisions leave much to be desired when it comes to transparency
and speed. Just as Heathers (2019) put it, “In reality, mechanisms to correct bad science are slow,
unreliably enforced, capricious, run with only the barest nod towards formal policy, confer no
reward and sometimes punitive elements for a complainant who might use them.” At the journal
level, a study by Trikalinos et al. (2008) found that retraction decisions due to falsification at
high impact journals take a long time (median = 28 months) and even longer when a senior
researcher is implicated (79 months). At the institutional level, investigation of possible
misconduct suffers from inherent conflict of interest, lack of standardization, and little quality
controlall of which lower the credibility of the investigation outcome (Gunsalus et al., 2018).
Ironically, whistleblowers, who are typically the instigators of misconduct investigations and
subsequent retractions and who are in greater need of anonymity for fear of a possible backlash,
may find it hard to remain anonymous (Hartgerink & Wicherts, 2016). Perpetrators are usually
closely associated with whistleblowers, and so the whistleblowing dilemma is whether to go
public and possibly have one’s own papers retracted (previously coauthored with the perpetrator)
or just remain silent. Considering all of this, Vazire and Holcombe (2020) argued that
19
transparency, on its own, is not enough for science to be self-correcting. The research
community must additionally proactively encourage and incentivize error detection and
correction, or else science will simply be openly rather than covertly flawed.
Incentives: Rewarding Open Research
Unlike most of the strategies discussed in the previous sections, changing academic incentives to
align them with the principles of open science is not something an individual (be it an author, a
reviewer, or an editor) can effectively do on their own (Berez-Kroeker et al., 2018). This is a
system-level issue. Restructuring academic incentives requires a collective effort from multiple
stakeholders to re-envision the whole scientific culture. The primary incentive in the current
research climate can be summarized as follows: Successfully publish as many papers as possible
in order to get promotion and tenureor perish. Thus, whether this volume of published research
presents highly valid and replicable findings is typically relegated to a secondary position. A
second incentive is represented in citation metrics. Parallel to the publish or perish predicament,
metrics represent another “impact or perish” climate (Biagioli & Lippman, 2020, p. 10) where
metrics and citation indices become an end in themselves for journals and institutions. Chairs
and deans no longer have to read the content of research by their faculty to assess its quality, but
rather use its metrics (i.e., journal impact factors and citation count) and then tally them for
annual reports submitted to higher management and potential donors. As predicted by
Campbell’s and Goodhart’s laws, once a quantitative indicator is introduced as a measure for
decision-making, it will become subject to gaming, hacking, and corruptioneventually
rendering it no longer a valid indicator.
20
One clear consequence of the perverted race toward metrics and impact factors is journal
selectivity and intolerance of anything less than perfect-looking results (Giner-Sorolla, 2012).
Most relationships of interest in the social sciences are too complex and/or require measurements
that are too ‘noisy’ to yield clear-cut findings in support of a given hypothesis. At the same time,
authors are aware than editors and reviewers at top-tier journals are primarily looking for the
slightest reason to reject a manuscript in order to accommodate the precious few annual slots the
journal has. This awareness can pressure the researcher to become more like a salesperson
promoting a product than a scientist presenting reality as messy as it really is. One collective
initiative aimed at ameliorating the urge to favor innovation over verification is the Transparency
and Openness Promotion (TOP) Guidelines (Nosek et al., 2015). The TOP Guidelines offer
journals a set of standards advocating principles of reproducibility, transparency, and openness.
At the time of writing, over 5,000 signatories representing various journals and organizations
have signed up in support of these guidelines. TOP Guidelines provide eight transparency
standards, each with three levels of increasing stringency that journals choose to adopt.
Examples include citation standards for data and materials, their availability in a public
repository, preregistration and registered reports, and badges to reward these practices (for the
latest version of these guidelines, see www.cos.io/initiatives/top-guidelines).
9
Another collective,
though more aggressive, initiative is the Peer Reviewers’ Openness Initiative (Morey et al.,
2016), where signatories refuse to review any manuscript that does not follow the standards of
open science (www.opennessinitiative.org).
9
It is interesting to note that, in applied linguistics, the journal with by far the highest score indicating adherence to
TOP Guidelines at the time of writing is Language Learning. The journal with the highest score across all
disciplines is Meta-Psychology. See www.topfactor.org.
21
Other initiatives to encourage scholars to conduct replication research is to find a way to
tell which findings are more likely to replicate, thus maximizing the chance of identifying real
effects. One innovative approach relies on replication markets (www.replicationmarkets.com),
where a large group of researchers bet on the replicability on the assumption that this can
potentially cancel out individual errors, a process that has shown a curiously high level of
accuracy (Liu et al., 2020; Uhlmann et al., 2019). A second innovative approach is to draw from
artificial intelligence to obtain an estimate of the replicability of certain findings (Gordon et al.,
2020; Yang et al., 2020). A program that is currently generating a lot of interest, developed by
the US Defense Advanced Research Projects Agency (DARPA), is called Systematizing
Confidence in Open Research and Evidence (SCORE; see
https://www.darpa.mil/program/systematizing-confidence-in-open-research-and-evidence).
SCORE aims to be able to analyze the content of empirical studies in the social and behavioral
sciences and then generate confidence scores representing a quantitative assessment of how
likely a certain claim will replicate successfully.
Finally, there is wide agreement that part of the responsibility for promoting open science
falls on funders (see Brysbaert, this volume). In recognition of this, the European Union has
recently introduced Plan S (www.coalition-s.org), which requires authors who obtain public
grant funding to publish the outcome of their research in an open access journal or platform. This
is one of the few institutional initiatives that are explicitly aimed at open science, as most others
(like the ones described above) are grassroot initiatives that are locally organized (for a list of
open scholarship grassroots community networks, see Nosek et al., 2020). Nevertheless,
Lilienfeld (2017) argued that the incessant push to obtain grants is responsible for creating a
“corporate culture of academia” (p. 663) even for scholars whose research does not genuinely
22
require funding. In fact, in many contexts, faculty are evaluated, promoted, and tenured based on
the grant dollars they procure not the rigor of their research. And, despite the existence of many
counter examples, the greater the amount of external funding received, the higher-impact that
research is assumed to be and the greater the value it is perceived to have. This is how
psychologist Scott Lilienfeld described his experience:
About a decade ago, I was a regular attendee at Grand Rounds presentations in a
prestigious psychiatry department. Before introducing speakers, the chairperson routinely
kicked off sessions by announcing the names of professors who had received large
federal grants along with their precise dollar amounts. I was struck that he never
announced faculty members’ important publications or scientific discoveries. I have since
come to realize that this reinforcement pattern is common in psychology departments,
too: Faculty members routinely receive plaudits for receiving grants but frequently find
that their scholarly accomplishments go largely unnoticed. (Lilienfeld, 2017, p. 661)
Lilienfeld (2017) argued that this grant culture has several downsides including exclusive
focus on programmatic research, intellectual hyperspecialization, stifling risk-taking,
overpromising significant real-world applications, and taking up time that could have been
devoted to deeper thinkingnone of which could be addressed by open science practices per se
like preregistration. A proposal by Ioannidis (2014) argues that grants and awards, reconstrued as
opportunities, should be viewed negatively until one delivers research output that matches these
opportunities, ideally in relation to replicability and translation to practice.
10
10
The landscape of open science is rapidly changing. At the time of writing, news emerged that the UK
Reproducibility Network, a consortium of 18 universities, has received significant funding (£8.5M) to promote the
23
Conclusion
A recurring message throughout this chapter is that open science is more than just a journal
article being open access. Besides open communication of the final research product, a range of
additional topics fall under the rubric of open science, including open methods (open data,
materials, and code), openness to replication and preregistration, open evaluation of research
rigor, and incentives to engage in all of these practices. Some of these topics require initiatives
involving collective effort for them to bear fruit, but simple strategies (like requiring authors to
deposit postprints, i.e. the final preformatted version of their accepted articles, in an online
repository) can go a long way in making our scholarship more open, more accessible and,
ultimately, more valuable for our society as a whole.
The tone we have taken in this chapter may appear rather pessimistic. However,
according to some optimistic observers, such soul-searching can be valuable to an academic field
in the long run. For example, some psychologistswhose field has been rocked by a miscellany
of scandals ranging from fraud and retractions to prevalent questionable research practices
view these developments more positively, as psychology’s renaissance (Nelson et al., 2018) and
as “among psychological science’s finest hours” (Lilienfeld, 2017, p. 660). If and when such a
reckoning lands in our own field, so too should applied linguists.
References
Agrawal, A. A. (2005). Corruption of journal Impact Factors. Trends in Ecology & Evolution,
20(4), 157. https://doi.org/https://doi.org/10.1016/j.tree.2005.02.002
uptake of open research practices. This is a promising move that is expected to have a major impact on open science
practices. See https://www.ukrn.org/2021/09/15/major-funding-boost-for-uks-open-research-agenda.
24
Al-Hoorie, A. H., Hiver, P., Larsen-Freeman, D., & Lowie, W. (under review). From replication
to substantiation: A complexity theory perspective.
Al-Hoorie, A. H., & Vitta, J. P. (2019). The seven sins of L2 research: A review of 30 journals’
statistical quality and their CiteScore, SJR, SNIP, JCR Impact Factors. Language
Teaching Research, 23(6), 727744. https://doi.org/10.1177/1362168818767191
Barbour, B., & Stell, B. M. (2020). PubPeer: Scientific assessment without metrics. In M.
Biagioli & A. Lippman (Eds.), Gaming the metrics: Misconduct and manipulation in
academic research (pp. 149155). MIT Press.
Beall, J. (2012). Predatory publishers are corrupting open access. Nature News, 489(7415), 179.
https://doi.org/10.1038/489179a
Beall, J. (2017). What I learned from predatory publishers. Biochemia medica, 27(2), 273278.
https://doi.org/10.11613/BM.2017.029
Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive
influences on cognition and affect. Journal of Personality and Social Psychology, 100(3),
407425. https://doi.org/10.1037/a0021524
Berez-Kroeker, A. L., Gawne, L., Kung, S. S., Kelly, B. F., Heston, T., Holton, G., Pulsifer, P.,
Beaver, D. I., Chelliah, S., Dubinsky, S., Meier, R. P., Thieberger, N., Rice, K., &
Woodbury, A. C. (2018). Reproducible research in linguistics: A position statement on
data citation and attribution in our field. Linguistics, 56(1), 118.
https://doi.org/10.1515/ling-2017-0032
Biagioli, M., & Lippman, A. (2020). Introduction: Metrics and the new ecologies of academic
misconduct. In M. Biagioli & A. Lippman (Eds.), Gaming the metrics: Misconduct and
manipulation in academic research (pp. 123). MIT Press.
Bohannon, J. (2014). Replication effort provokes praise—and ‘bullying’ charges. Science,
344(6186), 788789. https://doi.org/10.1126/science.344.6186.788
Bornmann, L., Mutz, R., & Daniel, H.-D. (2010). A reliability-generalization study of journal
peer reviews: A multilevel meta-analysis of inter-rater reliability and its seterminants.
PLoS One, 5(12), e14331. https://doi.org/10.1371/journal.pone.0014331
Brumback, R. A. (2008). Worshiping false idols: The impact factor dilemma. Journal of Child
Neurology, 23(4), 365367. https://doi.org/10.1177/0883073808315170
Brumback, R. A. (2009). Impact factor wars: Episode VThe empire strikes back. Journal of
Child Neurology, 24(3), 260262. https://doi.org/10.1177/0883073808331366
Carney, D. R., Cuddy, A. J. C., & Yap, A. J. (2010). Power posing: Brief nonverbal displays
affect neuroendocrine levels and risk tolerance. Psychological Science, 21(10), 1363
1368. https://doi.org/10.1177/0956797610383437
Clair, J. A. (2015). Toward a bill of rights for manuscript submitters. Academy of Management
Learning & Education, 14(1), 111131. https://doi.org/10.5465/amle.2013.0371
Clemens, M. A. (2017). The meaning of failed replications: A review and proposal. Journal of
Economic Surveys, 31(1), 326342. https://doi.org/10.1111/joes.12139
COPE: Complaints and Appeals Focus. (2018). Committee on Publication Ethics (COPE).
https://publicationethics.org/news/cope-education-subcommittee-focus-complaints-and-
appeals
Coyne, J. C. (2016). Replication initiatives will not salvage the trustworthiness of psychology.
BMC Psychology, 4(Article 28). https://doi.org/10.1186/s40359-016-0134-3
25
Coyne, J. C., & Tennen, H. (2010). Positive psychology in cancer care: Bad science, exaggerated
claims, and unproven medicine. Annals of Behavioral Medicine, 39(1), 1626.
https://doi.org/10.1007/s12160-009-9154-z
Coyne, J. C., Tennen, H., & Ranchor, A. V. (2010). Positive psychology in cancer care: A story
line resistant to evidence. Annals of Behavioral Medicine, 39(1), 3542.
https://doi.org/10.1007/s12160-010-9157-9
Eder, A. B., & Frings, C. (2021). Registered Report 2.0: The PCI RR Initiative. Experimental
Psychology, 68(1), 13. https://doi.org/10.1027/1618-3169/a000512
Eisen, M. B., Akhmanova, A., Behrens, T. E., Harper, D. M., Weigel, D., & Zaidi, M. (2020).
Peer review: Implementing a “publish, then review” model of publishing. Elife, 9,
e64910. https://doi.org/10.7554/eLife.64910
Falagas, M. E., & Alexiou, V. G. (2008). The top-ten in journal impact factor manipulation.
Archivum Immunologiae et Therapiae Experimentalis, 56(4), 223226.
https://doi.org/10.1007/s00005-008-0024-5
Fetterman, A. K., & Sassenberg, K. (2015). The reputational consequences of failed replications
and wrongness admission among scientists. PLoS One, 10(12), e0143723.
https://doi.org/10.1371/journal.pone.0143723
Fiske, S. T. (2016). A call to change science’s culture of shaming.
https://www.psychologicalscience.org/observer/a-call-to-change-sciences-culture-of-
shaming
Freese, J., & Peterson, D. (2017). Replication in social science. Annual Review of Sociology,
43(1), 147165. https://doi.org/10.1146/annurev-soc-060116-053450
Freese, J., & Peterson, D. (2018). The emergence of statistical objectivity: Changing ideas of
epistemic vice and virtue in science. Sociological Theory, 36(3), 289313.
https://doi.org/10.1177/0735275118794987
Gass, S., Loewen, S., & Plonsky, L. (2021). Coming of age: The past, present, and future of
quantitative SLA research. Language Teaching, 54(2), 245258.
https://doi.org/10.1017/S0261444819000430
Giner-Sorolla, R. (2012). Science or art? How aesthetic standards grease the way through the
publication bottleneck but undermine science. Perspectives on Psychological Science,
7(6), 562571. https://doi.org/10.1177/1745691612457576
Gordon, M., Viganola, D., Bishop, M., Chen, Y., Dreber, A., Goldfedder, B., Holzmeister, F.,
Johannesson, M., Liu, Y., Twardy, C., Wang, J., & Pfeiffer, T. (2020). Are replication
rates the same across academic fields? Community forecasts from the DARPA SCORE
programme. Royal Society Open Science, 7(7), 200566.
https://doi.org/10.1098/rsos.200566
Grudniewicz, A., Moher, D., Cobey, K. D., Bryson, G. L., Cukier, S., Allen, K., Ardern, C.,
Balcom, L., Barros, T., Berger, M., Buitrago Ciro, J., Cugusi, L., Donaldson, M. R.,
Egger, M., Graham, I. D., Hodgkinson, M., Khan, K. M., Mabizela, M., Manca, A., . . .
Lalu, M. M. (2019). Predatory journals: No definition, no defence. Nature, 576, 210212.
https://doi.org/10.1038/d41586-019-03759-y
Gunsalus, C. K., Marcus, A. R., & Oransky, I. (2018). Institutional research misconduct reports
need more credibility. JAMA, 319(13), 13151316.
https://doi.org/10.1001/jama.2018.0358
26
Hartgerink, C. H. J., & Wicherts, J. M. (2016). Research practices and assessment of research
misconduct. ScienceOpen Research. https://doi.org/10.14293/S2199-1006.1.SOR-
SOCSCI.ARYSBI.v1
Heathers, J., [@jamesheathers]. (2019, February 28). In reality, mechanisms to correct bad
science are slow, unreliably enforced, capricious, run with only the barest nod towards
formal policy, confer no reward and sometimes punitive elements for a complainant who
might use them [tweet]. Twitter.
https://twitter.com/jamesheathers/status/1101161838308401157
Hiver, P., & Al-Hoorie, A. H. (2020a). Reexamining the role of vision in second language
motivation: A preregistered conceptual replication of You, Dörnyei, and Csizér (2016).
Language Learning, 70(1), 48102. https://doi.org/10.1111/lang.12371
Hiver, P., & Al-Hoorie, A. H. (2020b). Research methods for complexity theory in applied
linguistics. Multilingual Matters.
Hollenbeck, J. R., & Wright, P. M. (2017). Harking, Sharking, and Tharking: Making the case
for post hoc analysis of scientific data. Journal of Management, 43(1), 518.
https://doi.org/10.1177/0149206316679487
Huensch, A., & Nagle, C. (2021). The effect of speaker proficiency on intelligibility,
comprehensibility, and accentedness in L2 Spanish: A conceptual replication and
extension of Munro and Derwing (1995a). Language Learning, 71(3), 626668.
https://doi.org/10.1111/lang.12451
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLOS Medicine, 2(8),
e124. https://doi.org/10.1371/journal.pmed.0020124
Ioannidis, J. P. A. (2012). Why science is not necessarily self-correcting. Perspectives on
Psychological Science, 7(6), 645654. https://doi.org/10.1177/1745691612464056
Ioannidis, J. P. A. (2014). How to make more published research true. PLOS Medicine, 11(10),
e1001747. https://doi.org/10.1371/journal.pmed.1001747
Ioannidis, J. P. A., Fanelli, D., Dunne, D. D., & Goodman, S. N. (2015). Meta-research:
Evaluation and improvement of research methods and practices. PLOS Biology, 13(10),
e1002264. https://doi.org/10.1371/journal.pbio.1002264
Ioannidis, J. P. A., & Thombs, B. D. (2019). A user’s guide to inflated and manipulated impact
factors. European Journal of Clinical Investigation, 49(9), e13151.
https://doi.org/10.1111/eci.13151
Isbell, D., Brown, D., Chan, M., Derrick, D., Ghanem, R., Gutiérrez Arvizu, M. N., Schnur, E.,
Zhang, M., & Plonsky, L. (under review). Misconduct and questionable research
practices: The ethics of quantitative data handling and reporting in applied linguistics.
Jiang, S. (2021). Understanding authors’ psychological reactions to peer reviews: a text mining
approach. Scientometrics, 126, 60856103. https://doi.org/10.1007/s11192-021-04032-8
John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable
research practices with incentives for truth telling. Psychological Science, 23(5), 524
532. https://doi.org/10.1177/0956797611430953
Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social
Psychology Review, 2(3), 196217. https://doi.org/10.1207/s15327957pspr0203_4
Kunda, Z. (1987). Motivated inference: Self-serving generation and evaluation of causal
theories. Journal of Personality and Social Psychology, 53(4), 636647.
https://doi.org/10.1037/0022-3514.53.4.636
27
Lilienfeld, S. O. (2017). Psychology’s replication crisis and the grant culture: Righting the ship.
Perspectives on Psychological Science, 12(4), 660664.
https://doi.org/10.1177/1745691616687745
Liu, Y., Gordon, M., Wang, J., Bishop, M., Chen, Y., Pfeiffer, T., Twardy, C., & Viganola, D.
(2020). Replication markets: Results, lessons, challenges and opportunities in AI
replication. arXiv. https://arxiv.org/abs/2005.04543
Macháček, V., & Srholec, M. (2021). Predatory publishing in Scopus: Evidence on cross-country
differences. Scientometrics, 126(3), 18971921. https://doi.org/10.1007/s11192-020-
03852-4
Marina, T., & Sterligov, I. (2021). Prevalence of potentially predatory publishing in Scopus on
the country level. Scientometrics, 126(6), 50195077. https://doi.org/10.1007/s11192-
021-03899-x
Marsden, E., Mackey, A., & Plonsky, L. (2016). The IRIS Repository: Advancing research
practice and methodology. In A. Mackey & E. Marsden (Eds.), Advancing methodology
and practice: The IRIS repository of instruments for research into second languages (pp.
121). Routledge.
Marsden, E., Morgan-Short, K., Thompson, S., & Abugaber, D. (2018). Replication in second
language research: Narrative and systematic reviews and recommendations for the field.
Language Learning, 68(2), 321391. https://doi.org/10.1111/lang.12286
Marsden, E., Morgan-Short, K., Trofimovich, P., & Ellis, N. C. (2018). Introducing Registered
Reports at Language Learning: Promoting transparency, replication, and a synthetic ethic
in the language sciences. Language Learning, 68(2), 309320.
https://doi.org/10.1111/lang.12284
Marsden, E., & Plonsky, L. (2018). Data, open science, and methodological reform in second
language acquisition research. In A. Gudmestad & A. Edmonds (Eds.), Critical
reflections on data in second language acquisition (pp. 219228). John Benjamins.
McCarthy, R. J., & Chartier, C. R. (2017). Collections2: Using “crowdsourcing” within
psychological research. Collabra: Psychology, 3(1), 26.
https://doi.org/10.1525/collabra.107
Merton, R. K. (1973). The sociology of science: Theoretical and empirical investigations.
University of Chicago Press.
Moranski, K., & Ziegler, N. (2021). A case for multisite second language acquisition research:
Challenges, risks, and rewards. Language Learning, 71(1), 204242.
https://doi.org/https://doi.org/10.1111/lang.12434
Morey, R. D., Chambers, C. D., Etchells, P. J., Harris, C. R., Hoekstra, R., Lakens, D.,
Lewandowsky, S., Morey, C. C., Newman, D. P., Schönbrodt, F. D., Vanpaemel, W.,
Wagenmakers, E.-J., & Zwaan, R. A. (2016). The Peer Reviewers’ Openness Initiative:
Incentivizing open research practices through peer review. Royal Society Open Science,
3(1), 150547. https://doi.org/10.1098/rsos.150547
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., Percie du Sert,
N., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., & Ioannidis, J. P. A. (2017). A
manifesto for reproducible science. Nature Human Behaviour, 1, 0021.
https://doi.org/10.1038/s41562-016-0021
Nelson, L. D., Simmons, J. P., & Simonsohn, U. (2018). Psychology’s renaissance. Annual
Review of Psychology, 69(1), 511534. https://doi.org/10.1146/annurev-psych-122216-
011836
28
Neuliep, J. W., & Crandall, R. (1990). Editorial bias against replication research. Journal of
Social Behavior & Personality, 5(4), 8590.
Neuliep, J. W., & Crandall, R. (1993). Reviewer bias against replication research. Journal of
Social Behavior & Personality, 8(6), 2129.
Norris, J. M., & Ortega, L. (2006). The value and practice of research synthesis for language
learning and teaching. In J. M. Norris & L. Ortega (Eds.), Synthesizing research on
language learning and teaching (pp. 350). John Benjamins.
Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., Buck, S.,
Chambers, C. D., Chin, G., Christensen, G., Contestabile, M., Dafoe, A., Eich, E., Freese,
J., Glennerster, R., Goroff, D., Green, D. P., Hesse, B., Humphreys, M., . . . Yarkoni, T.
(2015). Promoting an open research culture. Science, 348(6242), 14221425.
https://doi.org/10.1126/science.aab2374
Nosek, B. A., & Bar-Anan, Y. (2012). Scientific utopia: I. Opening scientific communication.
Psychological Inquiry, 23(3), 217243. https://doi.org/10.1080/1047840X.2012.692215
Nosek, B. A., Beck, E. D., Campbell, L., Flake, J. K., Hardwicke, T. E., Mellor, D. T., van ’t
Veer, A. E., & Vazire, S. (2019). Preregistration is hard, and worthwhile. Trends in
Cognitive Sciences, 23(10), 815818.
https://doi.org/https://doi.org/10.1016/j.tics.2019.07.009
Nosek, B. A., Corker, K. S., Krall, T., Grasty, F. L., Brooks, R. E., III, Mellor, D. T., Van Tuyl,
S., Gurdal, G., Mboa, T., Ahinon, J. S., Moustafa, K., Entwood, J., Fraser, H., A., A.,
Fidler, F., Barbour, V., Ling, M., Miguel, E., Geltner, G., . . . Susi, T. (2020). NSF 19-
501 AccelNet proposal: Community of Open Scholarship Grassroots Networks (COSGN).
MetaArXiv. https://doi.org/10.31222/osf.io/d7mwk
Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The preregistration
revolution. Proceedings of the National Academy of Sciences, 115(11), 26002606.
https://doi.org/10.1073/pnas.1708274114
Nosek, B. A., & Errington, T. M. (2020). What is replication? PLOS Biology, 18(3), e3000691.
https://doi.org/10.1371/journal.pbio.3000691
Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific utopia: II. Restructuring incentives and
practices to promote truth over publishability. Perspectives on Psychological Science,
7(6), 615631. https://doi.org/10.1177/1745691612459058
O’Grady, C. (2021). Fifteen journals to outsource peer-review decisions. Science.
https://doi.org/10.1126/science.abj0447
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science.
Science, 349(6251), aac4716. https://doi.org/10.1126/science.aac4716
Plesser, H. E. (2018). Reproducibility vs. replicability: A brief history of a confused
terminology. Frontiers in Neuroinformatics, 11, Article 76.
https://doi.org/10.3389/fninf.2017.00076
Porte, G. K. (2012). Replication research in applied linguistics. Cambridge University Press.
Porte, G. K. (2013). Who needs replication? CALICO Journal, 30(1), 1015.
https://doi.org/10.11139/cj.30.1.10-15
Porte, G. K., & McManus, K. (2019). Doing replication research in applied linguistics.
Routledge.
Ralph, P., bin Ali, N., Baltes, S., Bianculli, D., Diaz, J., Dittrich, Y., Ernst, N., Felderer, M.,
Feldt, R., Filieri, A., B., B., de França, N., A., F. C., Gay, G., Gold, N., Graziotin, D., He,
29
P., Hoda, R., Juristo, N., . . . Vegas, S. (2021). Empirical standards for software
engineering research. arXiv. https://arxiv.org/abs/2010.03525
Roberts, J. L. (2016). Predatory journals: Think before you submit. Headache: The Journal of
Head and Face Pain, 56(4), 618621. https://doi.org/10.1111/head.12818
Schooler, J. W. (2014). Metascience could rescue the ‘replication crisis’. Nature, 515(7525), 9.
https://doi.org/10.1038/515009a
Schrodt, P. (2020). What is the bar? Differentiating good from great communication scholarship.
Communication Monographs, 87(1), 13.
https://doi.org/10.1080/03637751.2020.1709696
Shen, C., & Björk, B.-C. (2015). ‘Predatory’ open access: A longitudinal study of article
volumes and market characteristics. BMC Medicine, 13(1), 230.
https://doi.org/10.1186/s12916-015-0469-2
Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on Generality (COG): A proposed
addition to all empirical papers. Perspectives on Psychological Science, 12(6), 1123
1128. https://doi.org/10.1177/1745691617708630
Simonsohn, U. (2015). Small telescopes: Detectability and the evaluation of replication results.
Psychological Science, 26(5), 559569. https://doi.org/10.1177/0956797614567341
Smith, A. P. (2000). The journal as an overlay on preprint databases. Learned Publishing, 13(1),
4348. https://doi.org/https://doi.org/10.1087/09531510050145542
Soderberg, C. K., Errington, T. M., Schiavone, S. R., Bottesini, J., Thorn, F. S., Vazire, S.,
Esterling, K. M., & Nosek, B. A. (2021). Initial evidence of research quality of registered
reports compared with the standard publishing model. Nature Human Behaviour, 5(8),
990997. https://doi.org/10.1038/s41562-021-01142-4
Sönning, L., & Werner, V. (2021). The replication crisis, scientific revolutions, and linguistics.
Linguistics, 59(5), 11791206. https://doi.org/10.1515/ling-2019-0045
Spence, J. R., & Stanley, D. J. (2016). Prediction interval: What to expect when you’re expecting
… a replication. PLoS One, 11(9), e0162874.
https://doi.org/10.1371/journal.pone.0162874
Srivastava, S. (2012). A Pottery Barn rule for scientific journals. The Hardest Science.
https://www.thehardestscience.com/2012/09/27/a-pottery-barn-rule-for-scientific-
journals/
Stanley, D. J., & Spence, J. R. (2014). Expectations for replications: Are yours realistic?
Perspectives on Psychological Science, 9(3), 305318.
https://doi.org/10.1177/1745691614528518
Steegen, S., Tuerlinckx, F., Gelman, A., & Vanpaemel, W. (2016). Increasing transparency
through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702712.
https://doi.org/10.1177/1745691616658637
Sterling, S., & Plonsky, L. (under contract). Meta-research in applied linguistics. Routledge.
Tijdink, J. K., Verbeke, R., & Smulders, Y. M. (2014). Publication pressure and scientific
misconduct in medical scientists. Journal of Empirical Research on Human Research
Ethics, 9(5), 6471. https://doi.org/10.1177/1556264614552421
Trikalinos, N. A., Evangelou, E., & Ioannidis, J. P. A. (2008). Falsified papers in high-impact
journals were slow to retract and indistinguishable from nonfraudulent papers. Journal of
Clinical Epidemiology, 61(5), 464470. https://doi.org/10.1016/j.jclinepi.2007.11.019
Uhlmann, E. L., Ebersole, C. R., Chartier, C. R., Errington, T. M., Kidwell, M. C., Lai, C. K.,
McCarthy, R. J., Riegelman, A., Silberzahn, R., & Nosek, B. A. (2019). Scientific utopia
30
III: Crowdsourcing science. Perspectives on Psychological Science, 14(5), 711733.
https://doi.org/10.1177/1745691619850561
Valentine, J. C., Biglan, A., Boruch, R. F., Castro, F. G., Collins, L. M., Flay, B. R., Kellam, S.,
Mościcki, E. K., & Schinke, S. P. (2011). Replication in prevention science. Prevention
Science, 12(2), 103117. https://doi.org/10.1007/s11121-011-0217-6
van Rooyen, S., Delamothe, T., & Evans, S. J. W. (2010). Effect on peer review of telling
reviewers that their signed reviews might be posted on the web: Randomised controlled
trial. BMJ, 341, c5729. https://doi.org/10.1136/bmj.c5729
Vazire, S., & Holcombe, A. O. (2020). Where are the self-correcting mechanisms in science?
PsyArXiv. https://doi.org/10.31234/osf.io/kgqzt
Wong, P. T. (1981). Implicit editorial policies and the integrity of psychology as an empirical
science. American Psychologist, 36(6), 690691. https://doi.org/10.1037/0003-
066X.36.6.690
Yamada, Y. (2018). How to crack pre-registration: Toward transparent and open science.
Frontiers in Psychology, 9, 1831. https://doi.org/10.3389/fpsyg.2018.01831
Yang, Y., Youyou, W., & Uzzi, B. (2020). Estimating the deep replicability of scientific findings
using human and artificial intelligence. Proceedings of the National Academy of
Sciences, 117(20), 10762. https://doi.org/10.1073/pnas.1909046117
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
WATCH THE VIDEO: https://youtu.be/f6YG1kNuDbA?si=51cMZqNq0aGgcOom - In contemporary methodological thinking, replication holds a central place. However, relatively little attention has been paid to replication in the context of complex dynamic systems theory (CDST), perhaps due to uncertainty regarding the epistemology-methodology match between these domains. In this paper, we explore the place of replication in relation to open systems and argue that three conditions must be in place for replication research to be effective: results interpretability, theoretical maturity, and termino-logical precision. We consider whether these conditions are part of the applied linguistics body of work, and then propose a more comprehensive framework centering on what we call SUBSTANTIATION RESEARCH, only one aspect of which is replication. Using this framework, we discuss three approaches to dealing with replication from a CDST perspective theory. These approaches are moving from a representing to an intervening mindset, from a comprehensive theory to a mini-theory mindset, and from individual findings to a cumulative mindset.
Article
Full-text available
In registered reports (RRs), initial peer review and in-principle acceptance occur before knowing the research outcomes. This combats publication bias and distinguishes planned from unplanned research. How RRs could improve the credibility of research findings is straightforward, but there is little empirical evidence. Also, there could be unintended costs such as reducing novelty. Here, 353 researchers peer reviewed a pair of papers from 29 published RRs from psychology and neuroscience and 57 non-RR comparison papers. RRs numerically outperformed comparison papers on all 19 criteria (mean difference 0.46, scale range −4 to +4) with effects ranging from RRs being statistically indistinguishable from comparison papers in novelty (0.13, 95% credible interval [−0.24, 0.49]) and creativity (0.22, [−0.14, 0.58]) to sizeable improvements in rigour of methodology (0.99, [0.62, 1.35]) and analysis (0.97, [0.60, 1.34]) and overall paper quality (0.66, [0.30, 1.02]). RRs could improve research quality while reducing publication bias and ultimately improve the credibility of the published literature. Soderberg et al. asked scientists to peer review registered reports and standard articles post-publication, after information explicitly identifying the article type had been removed. Registered reports scored higher on some dimensions, including quality and rigour.
Article
Full-text available
Predatory publishing represents a major challenge to scholarly communication. This paper maps the infiltration of journals suspected of predatory practices into the citation database Scopus and examines cross-country differences in the propensity of scholars to publish in such journals. Using the names of "potential, possible, or probable" predatory journals and publishers on Beall's lists, we derived the ISSNs of 3,293 journals from Ulrichsweb and searched Scopus with them. 324 of journals that appear both in Beall's lists and Scopus with 164 thousand articles published over 2015-2017 were identified. Analysis of data for 172 countries in 4 fields of research indicates that there is a remarkable heterogeneity. In the most affected countries, including Kazakhstan and Indonesia, around 17% of articles fall into the predatory category, while some other countries have no predatory articles whatsoever. Countries with large research sectors at the medium level of economic development, especially in Asia and North Africa, tend to be most susceptible to predatory publishing. Arab, oil-rich and/or eastern countries also appear to be particularly vulnerable. Policymakers and stakeholders in these and other developing countries need to pay more attention to the quality of research evaluation. Supplementary information: The online version contains supplementary material available at (10.1007/s11192-020-03852-4).
Article
Full-text available
From July 2021 eLife will only review manuscripts already published as preprints, and will focus its editorial process on producing public reviews to be posted alongside the preprints.
Article
Peer reviews play a vital role in academic publishing. Authors have various feelings towards peer reviews. This study analyzes the experiences shared by authors in Scirev.org to understand these authors' psychological reactions to several aspects of peer reviews, including decisions, turnaround time, the number of reviews, and review quality. Text mining was used to extract different types of psychological reactions of authors, including affective processes and cognitive processes. Results show that authors' psychological responses to peer reviews are complex and cannot be summarized by a single numerical rating directly given by the authors. Rejection invokes anger, sadness, and disagreement, but not anxiety. Fast turnaround arouses positive emotions from authors, but slow peer review processes do not increase negative emotions as much. Low-quality reviews lead to a wide array of negative emotions, including anxiety, anger, and sadness.
Article
We present results of a large-scale study of potentially predatory journals (PPJ) represented in the Scopus database, which is widely used for research evaluation. Both journal metrics and country/disciplinary data have been evaluated for different groups of PPJ: those listed by Jeffrey Beall and those discontinued by Scopus because of “publication concerns”. Our results show that even after years of discontinuing, hundreds of active potentially predatory journals are still highly visible in the Scopus database. PPJ papers are continuously produced by all major countries, but with different prevalence. Most all science journal classification subject areas are affected. The largest number of PPJ papers are in engineering and medicine. On average, PPJ have much lower citation metrics than other Scopus-indexed journals. We conclude with a survey of the case of Russia and Kazakhstan where the share of PPJ papers in 2016 amounted to almost a half of all Kazakhstan papers in Scopus. Our data suggest a relation between PPJ prevalence and national research evaluation policies. As such policies become more widespread, the expansion of potentially predatory journal research will be increasingly important.
Article
Multisite research (MSR) offers the key advantages of greater statistical power and external validity via larger and more diverse participant pools. In second language acquisition (SLA) research, recent developments in meta‐analysis have created a robust foundation for MSR. Although logistical and financial obstacles can complicate expansion beyond a single site, we show that MSR's benefits can justify the investment of resources. We begin by outlining how developments in meta‐analytic research, replication, and access to data and materials have created an especially opportune moment for MSR. Next, we discuss the methodology for a classroom study on metacognitive instruction as an illustrative case, outlining the major elements of its design and implementation. Finally, we review how four issues critical to MSRs (funding, compliance, logistics, and analysis) were addressed in our example study, including a discussion on multilevel modeling. We conclude with a discussion of how the field can build upon existing research to advance multisite work.