Access to this full-text is provided by Taylor & Francis.
Content available from Cogent Education
This content is subject to copyright. Terms and conditions apply.
EDUCATIONAL ASSESSMENT & EVALUATION | RESEARCH ARTICLE
Do we focus on process over outcome? Review of published studies
in two prominent saudi journals
Ruwayshid Alruwaili
Language and Translation Dept., Arts and Education College, Northern Border University, Arar, Saudi Arabia
ABSTRACT
It is well-known that published science is judged by the judicious and careful use of
methods employed when acquiring knowledge and understanding. However, since
the beginning of the millennia, concerns over the validity and reliability of scientific
research have grown substantially. These concerns were mainly related to statistical
methods and interpretation of statistical inferences. This study attempts to investigate
the statistical practices of published studies in two prominent Saudi journals. The pur-
pose is to analyze the extension of the problem into a different geographical area. In
addition, it attempts to look at the established editorial procedures utilized in the
educational and linguistics sciences in Saudi Arabia. 114 published articles between
2017 and 2020 were analyzed by a 13-item checklist. The findings revealed an over-
reliance on p-value and almost 67% of the published studies interpreted p-value as a
proven population effect. In addition, researchers and editors relied heavily on a sig-
nificant p-value as a merit for publication. Moreover, there was a low reporting rate in
incorporating other statistical requirements such as CI and effect size. The implication
of the findings highlights the importance of a more “verifying: how you found it”
approach rather than over-reliance that significant p-value is publishable “what you
found approach”.
ARTICLE HISTORY
Received 13 December 2023
Revised 11 December 2024
Accepted 12 December 2024
KEYWORDS
p-value; statistical practices;
replication crisis; Saudi
journals; journal editorial
requirements
SUBJECTS
Psychological Methods &
Statistics; Publishing
Industry; Publishing
Introduction
It is well known that the value of published studies resides in the reliable knowledge that they can pro-
duce. The question then arises, how reliable are these results? One way to answer the question is to
look at the methods they employed and how rigorous they are? And what kind of interpretations and
inferences are gained from them. However, it is not always a straightforward issue. Published studies are
different in nature, questions, and methods. It is therefore really hard to critically assess them with one
unique scientific method (Devezer et al., 2021; Gigerenzer, 2004; Ioannidis, 2018; Meehl, 1967).
Research assessment hinges on two fundamental principles: reliability and validity. Validity refers to
the degree of measurement precision, while reliability indicates the degree of measurement consistency.
These two principles are crucial for guaranteeing the trustworthiness of scientific findings. However,
while validity and reliability are the bedrock of research quality, the recent scholarly focus has shifted
toward the importance of reproducibility and reliability overshadowing other crucial aspects of research
assessments. In fact, determining the reliability of published studies has gained a remarkable interest in
recent years in the media and academia. With news of failure to replicate the results of prominent stud-
ies in social sciences, doubts and questions started to emerge that they are not as reliable as many sci-
entists assume. This has generated what is now called a “reproducibility crisis”(Amrhein et al., 2019;
Nuzzo, 2014). A recent study in Nature Human Behaviour attempted to replicate the findings in 21 social
studies published in prestigious journals Science and Nature (Camerer et al., 2018). The findings revealed
that they were unable to yield the same effect in more than one-third of the studies. The finding is
CONTACT Ruwayshid Alruwaili Ruwayshid.alruwaili@nbu.edu.sa Languages and Translation Dept., Humanities and Social Sciences
College, Northern Border University, Arar, Saudi Arabia
ß2024 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The terms on which this article has been
published allow the posting of the Accepted Manuscript in a repository by the author(s) or with their consent.
COGENT EDUCATION
2025, VOL. 12, NO. 1, 2443286
https://doi.org/10.1080/2331186X.2024.2443286
highly significant because if scientists were unable to reproduce the same results from prestigious jour-
nals, what about other not-highly ranked journals or non- peer-reviewed journals. This seems to show
how serious the problem is and its far-reaching implications for the larger scientific and academic com-
munities (Lyu et al., 2020).
Crucially, the problem is frequently linked to the misinterpretation and misuse of statistical techniques
and methods namely p-value significance testing (Gigerenzer, 2018; Greenwald et al., 1996; Halsey, 2019;
Lakens, 2021). This statistical technique is usually utilized to draw inferences from the provided data.
Recent and old studies conducted in many places revealed that psychologists and social scientists have
poor knowledge about interpreting p-value correctly (Badenes-Ribera et al., 2015; Haller & Krauss, 2002;
Hoekstra et al., 2014; Lecoutre et al., 2003). However, journal editors and scientists rely on statistical sig-
nificance to view whether a result crosses a hypothetical threshold of .05 as it has a “real”effect.
Importantly, this has resulted and inflated in what is known as “publication bias”where significant
results find their way into peer-reviewed journals while non-significant results are not published.
Although the technique has been criticized heavily in the literature, recent calls are accumulating for the
entire concept of statistical significance to be abandoned or banned (Gigerenzer, 2018; Lakens, 2021;
Trafimow & Marks, 2015). Therefore, if we believe that misinterpretations of statistical techniques might
affect the quality of scientific research, then we need more scrutiny and examination of these techniques
instead of focusing merely on the outcome.
This study attempts to investigate the statistical practices of published studies in two prominent
Saudi journals. The purpose is to analyze the extension of the problem into a different geographical
area. In addition, it attempts to look at the established editorial procedures utilized in the educational
and linguistics sciences in Saudi Arabia.
Generally, the role of the journal editors and reviewers is to filter out poor submissions and to publish
ultimately the good ones (Edwards, 2022). The current study is, therefore, necessary to see how Saudi
journals fit within the context of reproducible and methodological reforms. By contrasting the existing
practices with the recommended best practices, the study will reveal whether there is sophistication in
adequate statistical reporting. It will also demonstrate how journal editors and reviewers act as
“academic custodians”and whether they were able to implement and transmit these adequate and
rigorous reporting practices to authors.
Ultimately, the evidence generated by the findings is probably going to serve as a foundation for
greater accountability, diligence, and academic rigor. At the global level, it will be central to building
knowledge about the effectiveness of the methodological reforms outside the Western scholarly publica-
tion by illuminating what is existing and what ought to be.
Why published studies
The value of scientific expertise depends on making sure that knowledge published in journal studies is
useful, reliable, and relevant. The scientific expertise thrives on debate and discussion. It also thrives on
people verifying and debating views using available evidence. Given this significance, it is important to
understand the value published studies can bring to society. Published studies generate a collective out-
put of practices and empirical generalizations which may have a little or huge wide-ranging implications
to society. In addition, the research enterprise grows very fast; therefore, improving the efficiency of sci-
entific investigation can translate published results into valuable contributions and developments to
society.
Moreover, the decision-making process usually depends on scientific expertise. Scientific production is
no longer limited to a number of research organizations, but also policymakers have more direct access
to what is published and it supports policymaking at all stages of the policy cycle. Crucially, policy-
makers usually have this tendency to place more confidence in published research than anecdotal evi-
dence. The published knowledge is perceived as “true”or a potent source of “finality”(Nosek et al.,
2012). However, placing more faith requires a more “verifying”approach to what is published.
Examining the reliability of the methods used in these published studies can reveal the nature of accu-
mulated knowledge. More importantly, the success of effective decision-making depends on more reli-
able evidence-method, and good science.
2 R. ALRUWAILI
Moreover, academic promotion is linked to the number of publications in the Saudi academic context
and elsewhere in the Middle East. Academics and scientists are rewarded and promoted based on what
they publish. In other words, publishing is essential for their career advancement (Shehata & Elgllab,
2018). However, the fact that “publishing is essential to success is just a fact of the trade”(Nosek et al.,
2012). Yet, it is very crucial to see whether publishing more studies is rewarded at the expense of rigor-
ous statistical standards. An academic journal gains its reputation for excellence in publishing high-
quality articles (Mansour, 2016). Moreover, journal editors are the “gatekeepers”of scholarly published
science. The review system guarantees the quality, integrity, and reproducibility of any scholarly article.
Given the fact that academics in Saudi Arabia need to thrive professionally, what are the requirements
journal editors seem to request to assure the quality of the research? The exact requirements may vary
from one journal to another, but the main goal remains to improve the scholarly article. Based on these
premises, the study aims to investigate Saudi editorial requirements in deciding and scrutinizing more
aspects of the accepted papers including methods and statistics. Particularly, do journal editors rely only
on the use of significance testing for their approval or do they augment their decision by requiring add-
itional measures of uncertainty to the analysis of the obtained inferences?
Literature review
Since the beginning of the millennia, concerns over the validity and reliability of scientific research have
grown substantially (Finch et al., 2004; Hoekstra et al., 2006). These concerns were related to statistical
methods and interpretation of statistical inferences or even inspecting questionable research practices
(QRP). It is therefore significant to conduct reviews and meta-analysis to be in a better position to pro-
duce generalizable findings. For example, Finch et al. (2001) reviewed 150 published articles in the
Journal of Applied Psychology(JAP) from 1940 to 1999. Their focus was to review the misconceptions
about significance testing practices utilized in these 150 studies. Moreover, they investigated the report-
ing of p-value according to the recommendations of the American Psychological Association’s (APA)
Task Force on Statistical Inference (TFSI). Unfortunately, what they observed was little progress or influ-
ence of TFSI recommendations had occurred in statistical reporting practices in JAP. In other words,
although editorial requirements were instantiated, they did not result in improved statistical practices.
Practices such as reporting power estimates or confidence intervals were rarely utilized.
Moreover, Gigerenzer (2004) desired to examine the impact of editorial requirements instantiated by
Geoffrey Loftus when he was appointed as the editor of the Memory and Cognition journal. Loftus
requested to authors to submit their papers with descriptive statistics and augment their analysis with
confidence interval figures instead of relying only on p-values. In other words, he wanted them to free
scholars from dichotomous yes/no decisions and drag them away from p-values reliance. What
Gigerenzer (2004) reported was a 32% decrease in over-reliance on p-values during his editorship. In
addition, Loftus observed a reluctance and stubbornness from authors to abandon p-values and
embrace the more uncertainty of an estimate such confidence intervals (Gigerenzer, 2004,2018).
Similarly, Hoekstra et al. (2006) reviewed articles submitted before and after the publication of the
fifth edition of the APA Publication Manual. In fact, the fifth manual endorsed the recommendations of
TFSI to use interval estimates and effect sizes in the analysis. Hoekstra et al. (2006) attempted to review
the influence of these changes on statistical practices whether they are reported as prescribed by the
APA manual. They analyzed 259 scholarly reports published in Psychonomic Bulletin & Review. All these
reports utilized significance testing as the method to analyze the data. A 13-item checklist was used to
evaluate statistical practices and determine whether they are reported as prescribed by the manual.
Unfortunately, the overall finding was that little change was observed in researchers’behavior. In other
words, editorial standards were not consistently followed and over-reliance on the “null ritual”was
observed at a large scale.
Crucially, these studies show that reforms are available yet slow. There have been calls to push these
reforms further and abandon the use of statistical significance testing practices (Amrhein et al., 2019;
Nuzzo, 2014) and advocate better statistical practices (Lakens, 2021). However, researchers continue to
demonstrate consistent favoritism of statistical practices that utilize the “null ritual”(Gigerenzer, 2004).
This is especially worrying in light of the “reproducibility”concerns (Gigerenzer, 2018; Hoekstra et al.,
COGENT EDUCATION 3
2006; Nosek et al., 2012). To what extent the problem exists in the Saudi or Arab context is unknown.
Ibrahim (2021)provided an overview survey of statistical methods used in Library and Information
Science journals in Arabic literature. He calculated the type of statistical methods such as descriptive or
inferential used in the published studies. However, he did not show how these tests were interpreted or
reported. It is therefore significant to see how statistical practices are reported in Arabic journals in gen-
eral and in the Saudi context in particular.
Simply put, there is a scarcity of studies targeting other geographical areas in terms of methodo-
logical reforms in scholarly publications or gatekeeping processes. Most of what being currently dis-
cussed is related to Western academia. The need to examine other contexts is highly valuable to
illuminate the scale of the crisis in other parallel contexts. As seen in the literature, there is an uneven
level of awareness of issues related to methodological reforms and reproducibility across fields and con-
texts at the global level (Amrhein et al., 2019; Nuzzo, 2014). Thus, ongoing discussions about improving
research practices, data-sharing policies, and the adoption of best practices can bridge this gap of
awareness.
Moreover, in Saudi Arabia, journal editors and reviewers are set up by the university boards to regu-
late research and scholarly publication. The study will describe (later in the discussion) the major issues
and concerns posed by this trust and peer-review process in transmitting the best practices and the
merits for publication to authors. In other words, the study will expand the interpretive power of the
current crisis to examine other culture factors. This is of great significance to include when initiating
culture-conscious practices in relation to reproducibility, methodological advances, and open science in
developing and non-Western countries. In addition, another contribution is in promoting methodological
rigor in statistical reporting. By identifying common errors and flaws in statistical reporting practices,
such as failure to report effect sizes and confidence intervals, the findings provide guidance for research-
ers as well as editors and reviewers to improve the quality of their statistical reporting. Inadequate
methods and incomplete reporting can lead to avoidable research waste which can be boosted by
minor methodological adjustments and adherence to best reporting practices. These measures are easier
to implement and cost-effective to be utilized by journal editors and reviewers.
The findings will also show the importance of addressing publication bias and over-reliance on p-
value results for merit of publication. This will provide a more accurate representation of the available
evidence and can mitigate the impact of publication bias on the overall body of literature. Ultimately,
the study will promote the credibility and reliability of research findings in social science fields in general
and in the context of Saudi Arabia in particular.
The objectives of the study
The key objectives of the study are:
1. To identify the editorial requirements in relation to statistical practices in two prominent Saudi jour-
nals. Do they require alternatives beyond p-value?
2. To measure reporting statistical practices in published scholarly articles in two prominent Saudi
journals.
3. To determine whether the editorial requirements or reporting practices vary across the years or
journal.
Widespread concerns about the reliability and rigor of published research have centered around the
misuse and misinterpretation of statistical practices and reporting (Gigerenzer, 2004; Trafimow & Marks,
2015). To attain scientific rigor and research soundness, statistical practices have to receive formal scru-
tiny. Recently, Lakens (2021, p. 645) stated “If we really consider the misinterpretation of p values to be
one of the more serious problems affecting the quality of scientific research, we need to seriously reflect on
whether we have done enough to prevent misunderstandings”.
The emphasis and novelty of the current study is to examine the extension of this formal line of
inquiry into a new geographical area. Most of the published studies and discussions about
“reproducibility”and “statistical practices”are conducted in Western academia (Finch et al., 2004;
4 R. ALRUWAILI
Gigerenzer, 2018; Nosek et al., 2012). Moreover, rigorous statistical practices and procedures might result
in solid conclusions that scientific progress requires or high-quality evidence-based policies are based on
in Saudi Arabia.
Research methodology
The study examined statistical practices and procedures used in two prominent Saudi journals. The two
journals publish studies in relation to educational and linguistic research. In addition, they are not lim-
ited to Saudi Arabia, but also publish studies from other Gulf and Arab countries in both languages
Arabic and English. There are a number of reasons behind selecting these two journals that are impor-
tant to Saudi academia. First of all, universities in Saudi Arabia are classified mainly into two categories:
old and newly established universities. The old universities have a bigger budget compared to newly
established universities. Some of this budget is allocated to research and scholarly publication including
university journals. In addition, they have an adequate number of experienced faculty who are part of
the review process unlike the newly established ones who are under-staffed in order to operate effect-
ively. Moreover, these two journals have an extensive history of scientific publication in Saudi Arabia. So,
they have a history of processing and publishing intellectual knowledge. Therefore, studies published in
these two journals are accepted in academic promotions and considered “rigorous”in the Saudi context.
Furthermore, the selection of these journals based on legacy and importance to Saudi academia reflects
a commitment to understanding the local academic context. Although these two journals are not inter-
nationally recognized or indexed in databases such as Scopus or WoS, this might reveal valuable insights
into the unique academic environment in Saudi Arabia, thereby providing a more comprehensive view
of academic output in the country. While the lack of indexing in internationally recognized databases is
a valid concern, it should not neglect the importance of these journals in shaping the academic practi-
ces and emerging trends within Saudi Arabia. Recognizing the value of these journals contributes to a
more nuanced understanding of the country’s academic environment and improves the research that is
both impactful and meaningful.
To show how these two journals assure the academic community and the general public that pro-
duced knowledge is being intellectually scrutinized, the current study is going to zoom into their editor-
ial requirements to achieve reliability and validity of their published findings. It will evaluate how
statistical practices are reported and utilized. The creation of knowledge is based upon the use of statis-
tics and its elaboration on future knowledge. Statistical reporting and its interpretation need careful
deliberation. Evaluating the statistical practices will reveal the approach to the analysis of the data and
how to draw results. The evaluation study will expose then the prevailing practices such as excessive or
misuse of some tools (if existent). Crucially, the lack of rigor in scientific research is often blamed on
statistical practices such as p-value (Gigerenzer, 2018; Halsey, 2019; Hoekstra et al., 2014). Thus, the ana-
lysis will provide a straightforward picture of how “academic custodians”carry out the “gatekeeping
process”of statistical practices in Saudi Arabia regardless of other logistic factors such as lack of
reviewers or editors.
Importantly, the peer-review process in both journals is in some way different from the western type.
The editors and reviewers are selected by the university boards based on their experience and academic
rank to regulate the scholarly publication in these journals. The editorial board and the reviewers are
not necessarily from the same university, they can be from other universities or other countries.
However, they are nominated and approved by the university boards. The editorial boards are then
charged with accepting, filtering, and ultimately publishing research studies. Apart from obvious meth-
odological errors, they are also trusted to justify the choice of topics, to transmit the best methodo-
logical practices to authors, and to ensure that published knowledge is adequately discussed. With the
primary goals of scholarly publications, they have also the responsibility to set the degree of quality in
an underrepresented context and to promote the “verifying”approach to the analysis of data instead of
“what you found”.
The first step in the peer-review process is that authors submit their articles via the website or the
email of the journal. Once received by the journal editors, reviewers are assigned to scrutinize the sub-
mission based on their area of expertise. After a careful examination, they hand in their decision with
COGENT EDUCATION 5
the critical comments. The editors send back the decision to the authors. Critically, the process is guided
by the editorial boards and the referees’reports in either acceptance or rejection of the submission.
The journals
The two journals are 1- journal of “educational sciences”published by King Saud University (KSU); 2-
journal of “educational sciences”published by Imam Mohammed Ibn Saud Islamic University (IMSIU).
These two universities are among the oldest universities in Saudi Arabia. KSU was founded in 1957 and
IMSIU was established in 1974 (Smith & Abouammoh, 2013). KSU publishes a number of scientific jour-
nals ranging from physics to education and linguistics. Due its nature, IMSIU mainly publishes journals
related to Islamic culture, Arabic language, and Education and Linguistics. The choice reflects the signifi-
cance of these two journals in promoting science in the Saudi context. Moreover, Saudi faculty favors
publishing in these selective journals locally more than in international journals (Hanafi, 2011 for more
discussion).
KSU journal
The journal is issued by KSU. On its website (https://jes.ksu.edu.sa/ar), it aims at publishing scholarly
articles that are original and pioneering in education and other related fields. The journal has gone
under a number of name changes. The first issue was published in 1977 under the name “Studies”, then
the title was changed to “Educational Studies”in 1984. In 1992, it was named “Educational Sciences and
Islamic Studies”, then it was divided into two journals: “Educational Sciences”and “Islamic Studies”in
2012. From 2013, it has been issued under the name ‘Journal of Educational Sciences’. Its latest version
is indexed in international databases such as EBSCO and ResearchBib. It also follows the APA 6
th
ED style
in publication and referencing.
IMSIU journal
The journal is issued by IMSIU
1
. On its first issue, its vision states that it aims to produce, publish, and
apply knowledge. The journal is branched out from the older journal “humanity and social sciences jour-
nal”. The first issue was published in 2015. It follows the APA style in publication and referencing as
well.
Inclusion criteria
The focus is to include and evaluate the latest issues from 2017 to 2020. This would allow us to obtain
a more general overview of published studies’quality in both journals, and empirically evaluate the
reporting standards and statistical interpretations. A span of 3 years was decided upon for several rea-
sons. The first reason is to include only the latest issues after the period of name changes, so there is a
period of stability in editorial procedures. Moreover, choosing the latest issues is a better reflection of
“reproducibility”levels of awareness in editorial procedures. Publications over a current period of time
should be sufficient for studies of an exploratory nature such as this study to detect any systematic
changes in statistical procedures reporting across the time. The goal is also to observe any changes
aligned with recent calls for methodological reforms in Western academia. Including older studies might
overshadow this purpose, so conclusions drawn can be both current and robust without introducing sig-
nificant bias. Therefore, to ascertain that the implications and repercussions of the “reproducibility”crisis
and discussions about better “statistical practices”have been heard and propagated in the social media
and scientific news, only the latest issues were selected. In addition, choosing issues from a specific
period shows somehow a unity of purpose in the goals and topics in these journals. Thus, the decision
is to include and evaluate only the latest issues from the two journals and see how careful editorial pro-
cedures might reflect the careful use of statistical techniques.
6 R. ALRUWAILI
The checklist tool
A 13-item checklist was adopted from Hoekstra et al. (2006) to record the ways how statistical signifi-
cance is reported. The list was created to assess the reporting requirements and methodological quality
of research articles in the field of psychology. Each item in the list serves as a guideline for evaluating
specific aspects of the article, ensuring that essential elements are sufficiently addressed and reported.
In fact, reviewers and editors can critically evaluate the published studies and identify areas of improve-
ment in reporting standards and methodology. By systematically adopting these items, they can assess
the methodological rigor of the published articles, hence promoting validity and reliability. Reliability is
promoted when procedures and statistical analyses are clearly reported, hence can be replicated by
others. Validity is also strengthened when statistical analyses are appropriated applied and correctly
interpreted. In fact, the future strength of the published research depends on making sure that the pro-
duced knowledge is useful and reliable. The checklist consists of 5 categories detailed below and each
item was assigned “1”if occurrence found or “0”otherwise.
Reporting statistical significance as certainty
Inferring that statistically significant result means with certainty that the effect is present in the popula-
tion is statistically incorrect. Therefore, Arabic statements or words in the conclusion or the findings sec-
tions such as “the findings confirmed, the findings validated, confirmed”"ﺃﺙﺏﺕﻥﺍ،ﺃﺙﺏﺕﺕﺍﻝﻥﺕﺍﺉﺝ،
ﺃﻙﺩﺕﺍﻝﻥﺕﺍﺉﺝ " are taken as errors of this type. However, statements or words such as " ﺏﻱﻥﺕﺍﻝﻥﺕﺍﺉﺝ
ﺃﻥ،ﺃﻅﻩﺭﺕﺍﻝﻥﺕﺍﺉﺝﺃﻥ "“the findings showed, the findings demonstrated”are considered not errors of
this type. In addition, ambiguous statements such as " ﺡﻕﻕﺕﺍﻝﻡﺝﻡﻭﻉﺓﺍﻝﺽﺍﺏﻁﺓﻡﺱﺕﻭﻯﺃﻉﻝ
ﻯﻡﻥﺍﻝﻡﺝﻡﻭﻉﺓ
ﺍﻝﺕﺝﺭﻱﺏﻱﺓ "“The control group scored higher than the experimental group”are not considered errors
of this type.
Reporting there is no effect or a negligibly small effect
Interpreting statistically non-significant results as the absence of the effect in the population is consid-
ered an error of this type. Statements and phrases such as " ﻝﺍﻱﻭﺝﺩﺩﻝﻱﻝ،ﻝﺍﻱﻭﺝﺩﺃﺙﺭ "“no evidence, no
effect”are taken evidence of this error. Similarly, interpreting a negligibly small effect is also considered
type of this error without augmenting the result with a CI or an effect size.
Reporting exact or relative p-value
This category concerns with whether p-value is reported as an exact value or a relative one in the pub-
lished studies. According to the APA style manual, p-value should be reported as an exact value, and
any violation of this requirement is considered as an error of this type regardless of the outcome (signifi-
cant/non-significant).
Reporting CI
This category deals with the reporting CI. According to APA guidelines, CI should be reported alongside
the p-value to provide a measure of uncertainty to the result, otherwise, it is considered a violation of
the manual guidelines. Thus, the absence of reporting CI numerically or visually is considered an error of
this type.
Reporting effect size
p-value doesn’t provide alone information about the magnitude of an effect in the population.
Therefore, an effect size should be reported to draw conclusions from the sample about the size of the
effect. Thus, absence of reporting an effect size is considered an error of this type.
Procedures and coding
The sampling frame for this study was based on all research articles published between 2017 and 2020
from the two journals. A comprehensive review of the published research articles led to the identifica-
tion of a total 165 articles excluding book reviews. These articles were downloaded and further classified
based on their use of significance testing: (a) articles that utilized p-value in the analysis, (b) articles
COGENT EDUCATION 7
without p-value. The articles with p-value were coded across the five previously mentioned categories.
All articles were coded by the author and to ensure consistent application of the checklist, 80% were
cross-coded by a second researcher. The intercoder agreement rate was 92%.
Findings
What are the editorial requirements in the two journals? Do they go beyond p-value?
The main concern of the article is to look at the statistical requirements in the two journals as a micro-
scopic lens for the review process and whether the published knowledge is intellectually scrutinized.
The use of statistical methods has an enormous value in scholarly publication. Papers are judged by the
careful utilization of investigation methods and their interpretations. Thus, this study attempted to pro-
vide a description for the existing and widespread use of statistical practices in two Saudi journals. As
shown in Table 1, the findings show that more than half of the papers published in these two journals
relied on and reported null hypothesis tests (namely p-value) in their analyses and inferences. The num-
bers show that both journals are almost equal in their reliance on p-value in their reporting statistical
practices (69% and 68% respectively). To show if there is a variation in practice between the two jour-
nals, there was no significant difference for university journal, t(22) ¼.665, p¼.51, with a 95% confi-
dence interval of −0.97 to 1.9.
These numbers seem to indicate that p-value is the commonly used method to report the findings.
The questions arise then is it used appropriately as recommended? And what kind of interpretations are
attached to it? As shown in Table 2, the findings show that there are occurrences of p-value misinterpre-
tations. Significant p-value is more likely to be interpreted to imply with “certainty”that the effect is pre-
sent. Z-test for Difference of Two independent Proportions was conducted to determine if there is a
significant difference in the statistical practice between the two journals. The findings reveal that there
was no difference in interpreting p-value with “certainty”reading, z¼−0.13, p¼.89, u¼.012, with a 95%
confidence interval of -0.18 to .16. Similarly, there are occurrences of non-significant results interpreted
as the absence of the effect. Again, Z-test scores indicate that there was no significant difference in
practice between the two journals, z¼−0.68, p¼.49, u¼.063, with a 95% confidence interval of −0.24
to .116.
Two main conclusions can be drawn from the above data in Table 2 as far as the interpretation of p-
value is concerned: First, it is commonly interpreted as a dichotomous decision and erroneously equated
with certainty. The second is that this statistical interpretation uniformly does not vary across journal
publications. Note that this binary interpretation claims certainty on the basis of a significant result
(67%) which indicates that editorial procedures might endorse this binary thinking.
The observed findings due to p-value interpretations show that how the p-value is reported needs a
further investigation. APA guidelines require exact p-values reporting. Table 3 presents how the p-value
is reported in significant and nonsignificant results.
Table 3 shows that a large proportion of the cases are reported with exact p-values in significant and
nonsignificant findings. The median shows that the trend is most likely to report p-value. However, there
are some cases of relative or no p-value reporting which cast some doubts that the editorial procedures
are not consistently followed with regard to exact p-value reporting. Although previous studies
(Hoekstra et al., 2006) reported clear differences between reporting p-value practices when the outcome
is significant or nonsignificant, the density plots in Figure 1 demonstrate that no clear differences found
in the way which significant and nonsignificant results were reported. An exact McNemar’s test shows
that there was no statistically significant difference in the proportion of significant/nonsignificant p-value
Table 1. No of articles reporting with p-value tabulated with university journal in selected years.
Total no of articles Articles & p-value
ImamJ SaudJ ImamJ SaudJ
Valid 11 13 11 13
Sum 80.000 85.000 55.000 59.000
8 R. ALRUWAILI
reporting, X
2
(1)¼.862, p¼.353. The p-value is more likely to be reported regardless of the outcome sig-
nificant or nonsignificant.
To see whether statistical data and conclusions are reported on the basis of statistical significance
only, or they are corroborated and augmented by other statistical tools. In fact, drawing conclusions
should not be decided on whether a result is crossing a statistical threshold (Chow, 1998). The descrip-
tive statistics in Table 4 demonstrate other statistical practices whether utilized or not in the reporting.
As it can be noted, there is a total lack of CI reporting in the published studies. Failure to report or pre-
sent CI alongside the p-value is opposed to APA guidelines. Similarly, visual representations are almost
non-existent. These measures of uncertainty add more meanings to the findings and give another com-
plexion to the conclusions.
In relation to effect size, it is noteworthy that reporting effect size is critically essential rather than
relying only on the raw mean differences. Table 4 shows the unstandardized and standardized effect
size reporting practices. As it can be observed that unstandardized effect size such as giving the means
was unanimously reported. However, the reporting of standardized of effect size was relatively small
(24%, 28%) and indicated inconsistency. This finding indicates that while effect size in its broader mean-
ing is totally reported, reporting standardized effect size was relatively limited. Failure to report effect
size along with p-value is also observed in the literature (Hoekstra et al., 2006; Wei et al., 2019).
Table 2. Interpretations of p-value tabulated with University Journal.
Uni journal N Mean Median Sum SD Z-Test p-value Phi-Coefficient
Significance as Certainty ImamJ 55 0.6727 1 37 0.474 −.13 .89 .012
SaudJ 59 0.6610 1 39 0.477
Insignificant as No effect ImamJ 55 0.4182 0 23 0.498 −.68 .49 .063
SaudJ 59 0.3559 0 21 0.483
No or negligible effect ImamJ 55 0.0000 0 0 0.000
SaudJ 59 0.0169 0 1 0.130
Table 3. p-value reporting practice in significant and nonsignificant results.
Uni Journal N Median Sum SD
Significance with exact p-value ImamJ 55 1 32 0.498
SaudJ 59 1 37 0.488
Significance with relative p-value ImamJ 55 0 11 0.404
SaudJ 59 0 10 0.378
Significance with NO p-value ImamJ 55 0 6 0.315
SaudJ 59 0 3 0.222
Insignificance with exact p-value ImamJ 55 1 32 0.498
SaudJ 59 1 42 0.457
Insignificance with relative p-value ImamJ 55 0 4 0.262
SaudJ 59 0 0 0.000
Insignificance with NO p-value ImamJ 55 0 6 0.315
SaudJ 59 0 1 0.130
Figure 1. Density distribution for p-value reporting practice in significant and nonsignificant.
COGENT EDUCATION 9
Do the editorial requirements vary across the years?
It is important to see if the journal editors and reviewers were able to transmit the best practices to the
authors. The editors and the reviewers of the journals are trusted to promote the reliability of their
investigation methods. As mentioned previously, part of the current methodological reforms is to exer-
cise additional scrutiny to the methods employed. With much more attention and news about the
“reproducibility crisis”(Nuzzo, 2014), it is significant to see how editors and reviewers as well respond to
these calls in an underrepresented context in international and local journals. To see whether the report-
ing practices vary across the year, a chi-square test is conducted to see whether there is a correlation
between interpreting the p-value as “certainty”and the year of publication. The findings reveal no sig-
nificant level of association between the reporting rate and the year of publication, X
2
(6) ¼3.17, p ¼
.787, and Cramer’sVreveal a small level of association between these two variables, Cramer’sV¼0.167.
The descriptive statistics in Figure 2 reveal that reporting rates were not significantly higher than the
previous year in both journals.
Similarly, to examine if there is a difference in the exact reporting rate for p-value across the years, a
chi-square test is conducted, X
2
(6) ¼10.5, p¼.107, and Cramer’sVreveals a small-to-medium level of
association between these two variables, Cramer’sV¼0.303. These findings indicate that the strength of
association between reporting practices and years of publication lies in a negligible level of association.
In other words, reporting practices don’t vary across the years and editorial requirements and proce-
dures don’t change.
Importantly, the descriptive numbers in Figure 3 reveal no observed trend in reporting the standar-
dized effect size across the years rather than inconsistent individual choice.
Crucially, these findings are indicative of a troubling situation. Editorial procedures and reporting
practices seem to heavily rely on reporting p-value without incorporating other requirements. The low
reporting rates reflect inconsistency and individual choices rather than editorial requirements. In other
words, studies seem to be judged and evaluated on the basis of research outputs rather than how “did
you arrive at this conclusion”?
Table 4. Other non-p-value statistical practices reporting.
Uni Journal N Median Sum SD
Standardized effect size ImamJ 55 0 13 0.429
SaudJ 59 0 17 0.457
Measure of effect size (mean) ImamJ 55 1 54 0.135
SaudJ 59 1 59 0.000
Error bars ImamJ 55 0 2 0.189
SaudJ 59 0 2 0.183
Confidence intervals ImamJ 55 1 0 0.000
SaudJ 59 1 0 0.000
Figure 2. Reporting rate for p-value as a certainty across the years.
10 R. ALRUWAILI
Discussion
The study aimed at evaluating the statistical editorial requirements in two Saudi educational and linguis-
tic journals. There has been scarcity in studies that scan and review statistical reporting practices in
social science fields such as education and linguistics(see Hanafi & Arvanitis, 2015; Shehata & Elgllab,
2018). Therefore, to understand how editors manage to accept and publish studies in these journals, it
is fruitful to explore some basics of statistical requirements. Most importantly, good science is defined
by the judicious use and utilization of statistical techniques more than by what particular statistical tests
are used. Consequently, this requires more “verifying”approach than over-reliance and mistaken belief
that only significant p-value is publishable (Nosek et al., 2012). In other words, the focus should be on
the “process”that leads to these findings rather than “what are the findings?”. With the increasing
requirements for methodological reforms in research outputs comes the need to ensure that published
knowledge and knowledge creation is intellectually deliberated. Thus, a clear evaluation study is the
starting point for any effective discussion or impact.
To bridge this gap and address the recent calls for more robust statistical reporting practices, the
study was restricted to two journals perceived to be prestigious and published by two old established
institutions in Saudi Arabia (Hanafi & Arvanitis, 2015; Mansour, 2016). The evaluation of 165 published
articles revealed that statistical significance testing practice is overwhelmingly used, with almost 70% of
these articles utilizing p-value in their analysis and interpretation.
Although this study is exploratory in nature, it is innovative in targeting the statistical requirements
as a window of academic rigor and professional practice in local journals (Hanafi, 2011). The first obser-
vation is that almost 67% of the published studies interpreted p-value as a proven population effect.
This indicates a binary decision on the basis of a significant result. In other words, authors and editors
(including reviewers) rely heavily on a significant p-value as a merit for publication. Similarly, taking the
percentages together interpreting the p-value as a proven effect, and rejecting the insignificant p-value
indirectly imply errors are made in interpreting the outcome of a significant or insignificant p-value.
Misinterpretations of p-value are erroneously believed to be publishable and correct (Haller & Krauss,
2002; Hoekstra et al., 2006). In addition, both journals uniformly prefer to utilize p-value as a dichotom-
ous decision indicating a predominant behavior of equating p-value with certainty and a real effect in
local journals. This conclusion adds to the previous studies in the literature where misinterpretations of
NHST are commonly observed (Badenes-Ribera et al., 2015; Gigerenzer, 2004,2018; Haller & Krauss,
2002; Lecoutre et al., 2003; Lyu et al., 2020). Crucially, the practice of only publishing significant findings
is a well-documented phenomenon known as “publication bias”(Nosek et al., 2012). Journals tend to
favor studies with statistically significant results, leading to skewed and misleading descriptions of the
evidence in the literature. This bias distorts the scientific record by over-representing positive findings
and under-representing negative findings. Thus, the tendency to equate non-significant results with
Figure 3. Reporting practice of standardized effect size across the years.
COGENT EDUCATION 11
"no effect" poses significant and worrying challenges. It is more likely that authors adjust their submis-
sions to align with perceived editorial preferences in these journals. Therefore, reporting is likely to be
based on perceived journal preferences rather than on scientific merit.
Moreover, investigating the reporting of p-value in the published studies revealed that there were a
number of inconsistent practices. Despite APA guidelines prescribe that p-value should be reported with
exact values, the review found few cases of relative p-value reporting and a small number of no p-value
(see Table 3). These cases indicate that APA requirements were not consistently followed. However, it is
not possible to deduct the direction of p-value reporting practice unlike previous studies (Hoekstra
et al., 2006). Editors and researchers were more likely to report the exact p-value regardless of the
outcome whether significant or non-significant.
Given the fact that the p-value fails to provide the type of information the editors want to obtain,
what kind of other alternative statistical practices do journal editors request alongside the p-value.
Recommendations usually include reporting effect size, CI, and graphical displays to communicate the
distribution of the data (Finch et al., 2004; Gigerenzer, 2004; Hoekstra et al., 2014). The evaluation found
that alternative statistical practices seem to be almost absent in statistical reporting in these journals
(see Table 4). Although these alternatives are highly recommended by the APA manual, there seems to
be no continuous vigilance from the side of journal editors. The small number of effect size reporting
indicates an individual vigilance from the side of the researchers rather than an established editorial
requirement. Crucially, these alternatives are not immune to mis-interpretations (Greenwald et al., 1996;
Hoekstra et al., 2014), but what is highlighted is the need for more transparency when reporting the
results as well as acknowledging the limitations of the p-value. The argument is not to completely aban-
don the p-value. Using p-value is a good thing (Lakens, 2021), but even good things can be pushed or
utilized too far beyond their proper limits. There must be a room for other statistical practices to better
know and understand the results beside p-value.
Generally speaking, the findings pose critical challenges to the way knowledge is published in the
two journals. Apart from ensuring the smooth running of the review process, the journal editors have
the responsibility to increase and endorse the quality of creating knowledge. The findings provide an
unpleasant picture of how knowledge is intellectually processed. The implications drawn from the find-
ings that papers are minimally scrutinized in terms of statistical tools. The widespread, misuse, and the
absence of some other tools provide a glimpse into the dynamics of the review process in these two
journals. It seems that research outputs are simply assessed on the basis of “what did you find?”rather
than “how did you find it?”This casts doubts and challenges on the authority of journals as centers of
knowledge production in Saudi Arabia.
If journal editors don’t request or force these recommendations, no reaction is going to occur and
researchers will continue submitting scholarly articles without implementing good reporting practices.
Journal editors are the “gatekeepers”to initiate and cause a reaction and change. Therefore, they have
an important role to play in implementing and encouraging researchers to use and utilize these alterna-
tives in their reporting practices (Finch et al., 2004; Gigerenzer, 2004). They are supposed to transmit the
recommended best practices to authors. This can be directly achieved through requesting alternatives or
indirectly through reviewers’reports. The case study of editor Geoffrey Loftus with his editorial guide-
lines indicates that the change was noticeable in shifting the reliance from p-value to other alternative
practice (Gigerenzer, 2004).
The burden is largely on journal editors and reviewers in Saudi Arabia to initiate and advocate a
methodological reform (Gigerenzer, 2004). Papers should not be solely accepted on the basis of a signifi-
cant p-value as a merit for publication. There is a large international body of discussion and experience
which can be put to good use in mitigating if not solving these problems (Camerer et al., 2018;
Ioannidis, 2018). However, these recommendations and guidelines have to be translated into concrete
actions locally and regionally. Thus, Saudi journal editors should shift their focus to increased statistical
sophistication and promote and accommodate a set of fruitful beliefs across a range of areas and disci-
ples. However, there is no immunity in scientific research, but editors can make sure that the system
favors and implements good practices to improve how research is performed, evaluated, communicated,
and rewarded.
12 R. ALRUWAILI
Editors can request authors who submit only p-value articles to routinely include (1) figures, error
bars, and CIs to evaluate their findings. (2) effect size reporting is also needed for both statistically and
non-statistically significant results. (3) APA guidelines in relation to p-value exact reporting should be
consistently followed. (4) Authors need to incorporate p-value and effect-size interpretations in their
reporting as well as how these numbers should be interpreted within the research design and questions.
(5) the p-value should not be interpreted as a dichotomous decision or quantifying the improbability of
the null hypothesis. (6) Authors need to consider that a significant p-value doesn’t necessarily imply a
true effect. (7) reporting p-value should not be an end itself, but what kind of information to convey. (8)
The uncertainty has to be reflected as well in the academic writing. Words such as “prove”and ‘show”
should not be used, but a more accurate statement instead. The editors and reviewers of these two jour-
nals are strongly recommended to review and see recent and older calls for methodological reforms and
consider the use of sophisticated practices as a driving force for change and enhancing quality (Finch
et al., 2004; Gigerenzer, 2004,2018; Greenwald et al., 1996; Lakens, 2021; Meehl, 1967).
Importantly, these findings are about promoting rigor, transparency, and practical considerations to pub-
lished studies in two Saudi journals. Publishing in these two journals is important for academic survival.
Therefore, this evaluation study aimed at uncovering the statistical practices and editorial requirements. The
focus was on the statistical practices because they seemed to be relevant and a better reflection of the
methodological practices. We believe that the previously mentioned 8 points provide guidelines for adop-
tion as well as establishing common ground for research quality in general. Particularly, reviewers and edi-
tors could benefit from the utilization of these findings by empowering more positive outcomes such as
increased quality, sophisticated statistical practice, transparency, and openness. Furthermore, instead of rely-
ing only on p-value as a merit for publications, these findings might provide the impetus for a more careful
consideration when accepting a study for publication. The findings also show that these 8 points represent
a new and superior way of analyzing and interpreting results in the social journals in KSA.
There are some suggestions and possible future paths for journal editors and researchers to enhance
statistical reporting procedures in order to support methodological reform in Saudi academic publishing.
Initially, it is important for journal editors and reviewers to set explicit rules for statistical reporting,
including the need for thorough explanations of statistical procedures. Encouraging researchers to pre-
register their experiments and use open scientific techniques, which can help address problems like
selective reporting, is another possible improvement. More crucially, reviewers can be requested to
assess how well submitted publications’statistical information is clear and complete which in return pro-
vides authors with feedback on areas for improvement.
In terms of generalizability, the results might not be representative of the whole journal editorial
requirements in Saudi Arabia, as it focuses only on the social sciences. It might be a different situation
with natural science journals. Future research needs to investigate whether these editorial practices are
ubiquitous across all fields. However, the results might be representative in the social sciences because
the study focuses on established journals edited by experienced scholars with a full professor rank. It
seems therefore highly unlikely that the situation to be a different picture with newly published journals
from newly established institutions. Moreover, the present study has provided evidence for dominant
editorial practices regardless of the university journal. Therefore, poking holes in the current practices is
often how progress gets made. However, editors need to evoke the desired change, otherwise these
practices will continue to exist. Reichenbach once stated “If error is corrected whenever it is recognized,
the path of error is the path of truth.”Future research directions should focus on the impact of enhancing
transparency and integrating complementary statistical metrics across various fields. Moreover, it is
potentially promising to explore the need to adequately train reviewers or require a certain level of stat-
istical literacy for those reviewing manuscripts. By doing so, the Saudi scientific community can ensure
that p-values and other statistical tools are used and reported in ways that promote better understand-
ing and improve professional practice.
Conclusion
The broad purpose of this review is to provide a characterization of current statistical editorial require-
ments in two Saudi journals. The theoretical significance of the paper stems from exploring how academic
COGENT EDUCATION 13
knowledge is shaped and endorsed by editorial practices. Analysis of editorial practices in relation to stat-
istical requirements was carried out, and to what extent editorial requirements appear to be inclusive and
accommodating rigorous methodological practices. The results indicated that editors seem to be unaware
of recent and ongoing methodological reforms in social sciences (Gigerenzer, 2004; Ioannidis, 2018). This
unawareness was reflected in what was required to submit and publish a scholarly article in local social
science journals (Amrhein et al., 2019;Nuzzo,2014). The focus seems to be on what are the findings
“What did you find”rather than “How did you find it”. Accordingly, the practical significance derives from
the likely extent that some scholarly articles are effectively publishable on the basis of a statistical result,
such as over-reliance on the p-value as a criterion for acceptability. A statistically significant p-value is
often treated as a benchmark for determining whether a study’s findings are worthy of publication. Thus,
the article provides evidence of the inclusivity or exclusivity of knowledge production. The overemphasis
on p-values as a gatekeeping tool can create an exclusive knowledge production and methods. This trend
can have far-reaching implications for the quality and rigor of published research in Saudi Arabia.
Unfortunately, our evaluation didn’t reveal any meaningful improvements over the scanned period.
Moreover, we revealed that articles seem to be commonly publishable on the basis of a p-value only,
and there was an overwhelming reliance on p-value statistics or a “null ritual”(Gigerenzer, 2004). In add-
ition, this reliance seemed to encourage binary thinking of p-value statistics. The results of the present
study agree with the findings of previous studies (Haller & Krauss, 2002; Hoekstra et al., 2006). In other
words, the results mirror those of Western research, this might indicate the problem is uniformly univer-
sal and requires collective efforts. They also demonstrate that the study adds to the field, by supporting
noticeable findings and illuminating the scale of the crisis in other parallel contexts. APA guidelines
were also not followed regularly and consistently at a large scale.
However, there is no single solution to the worrying situation, but Saudi editors need more courage and
nerves to advocate more methodological reforms and instill a culture of change and vigilance. Enhancing
the quality of journal publications requires targeted strategies across key aspects of scholarly work, includ-
ing statistical rigor, and overall research integrity. For actionable strategies, Saudi journal editors can pro-
pose pre-submission peer reviews for manuscript improvement, addressing statistical assumptions explicitly,
and making methodologies and data accessible (see Lakens, 2021). In addition, editors and reviewers should
actively encourage the submission of studies with null or inconclusive findings to reduce publication bias
and enhance the diversity of the research landscape. More importantly, research quality should be judged
on criteria such as methodological rigor, and theoretical contributions beyond only p-value (Nosek et al.,
2012). Also, they need to educate researchers on best practices and encourage ongoing reforms and train-
ing in statistical methods and use of advanced tools for data analysis to improve the quality of results.
Finally, editors are the cornerstone to cause favorable change, otherwise poor editorial requirements
will continue to exist. Therefore, to improve academic publishing and increase the public’s trust in
research results, it is essentially necessary to demand and use proper and sophisticated statistical techni-
ques in scholarly submission and publishing. Saudi journal editors need to agree on a set of principles
and commitments for reforming the way research findings are evaluated and published.
Note
1. At this website https://units.imamu.edu.sa/deanships/SR/Units/Vice/Magazines/Pages/%D9%85%D8%AC%D9%
84%D8%A9-%D8%A7%D9%84%D8%B9%D9%84%D9%88%D9%85-%D8%A7%D9%84%D8%AA%D8%B1%D8%
A8%D9%88%D9%8A%D8%A9-.aspx.
Disclosure statement
No potential conflict of interest was reported by the author(s).
About the author
Ruwayshid Alruwaili is Assistant Professor and a former head of Applied Linguistics at Northern Border University,
and a UK Associate Fellow of the Higher Education Academy. His research interests include the acquisition of
14 R. ALRUWAILI
morphosyntactic features/applied linguistics in KSA, and research methods in SLA. He focuses on methodological
reforms to enhance rigor, transparency, and reliability across disciplines, with a keen interest in fostering better prac-
tices in academic and professional research. He is an entrepreneur and consultant in quality issues at HE
institutions.
References
Amrhein, V., Greenland, S., & McShane, B. (2019). Retire statistical significance. Nature,567(7748), 305–307. https://
media.nature.com/original/magazine-assets/d41586-019-00857-9/d41586-019-00857-9.pdf https://doi.org/10.1038/
d41586-019-00857-9
Badenes-Ribera, L., Fr
ıas-Navarro, D., Monterde-I-Bort, H., & Pascual-Soler, M. (2015). Interpretation of the p value: A
national survey study in academic psychologists from spain. Psicothema,27(3), 290–295. https://doi.org/10.7334/
psicothema2014.283
Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A.,
Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T.,
…Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010
and 2015. Nature Human Behaviour,2(9), 637–644. https://doi.org/10.1038/s41562-018-0399-z
Chow, S. L. (1998). Precis of statistical significance: Rationale, validity, and utility. The Behavioral and Brain Sciences,
21(2), 169–194. https://doi.org/10.1017/s0140525x98001162
Devezer, B., Navarro, D. J., Vandekerckhove, J., & Ozge Buzbas, E. (2021). The case for formal methodology in scien-
tific reform. Royal Society Open Science,8(3), 200805. https://doi.org/10.1098/rsos.200805
Edwards, J. (2022). The journal editor as academic custodian. In A. K. Habibie Pejmanand Hultgren (Ed.), The inner
world of gatekeeping in scholarly publication (pp. 227–244). Springer International Publishing. https://doi.org/10.
1007/978-3-031-06519-4_13
Finch, S., Cumming, G., & Thomason, N. (2001). Editor’s Note on the “Colloquium on Effect Sizes: The roles of editors,
textbook authors, and the publication manual. Educational and Psychological Measurement,61(2), 181–210. https://
doi.org/10.1177/00131640121971176
Finch, S., Cumming, G., Williams, J., Palmer, L. E. E., Griffith, E., Alders, C., Anderson, J., & Goodman, O. (2004). Reform
of statistical inference in psychology 313. Behavior Research Methods, Instruments, & Computers,36(2), 312–324.
https://doi.org/10.3758/BF03195577
Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics,33(5), 587–606. https://doi.org/10.1016/j.
socec.2004.09.033
Gigerenzer, G. (2018). Statistical rituals: The replication delusion and how we got there. Advances in Methods and
Practices in Psychological Science,1(2), 198–218. https://doi.org/10.1177/2515245918771329
Greenwald, A. G., Gonzalez, R., Harris, R. J., & Guthrie, D. (1996). Effect sizes and p values: What should be reported
and what should be replicated? Psychophysiology,33(2), 175–183. https://doi.org/10.1111/j.1469-8986.1996.
tb02121.x
Haller, H., & Krauss, S. (2002). Misinterpretations of significance: A problem students share with their teachers?
MPR-Online,7(January), 1–20.
Halsey, L. G. (2019). The reign of the p-value is over: What alternative analyses could we employ to fill the power
vacuum? Biology Letters,15(5), 20190174. https://doi.org/10.1098/rsbl.2019.0174
Hanafi, S. (2011). University systems in the arab east: Publish globally and perish locally vs publish locally and perish
globally. Current Sociology,59(3), 291–309. https://doi.org/10.1177/0011392111400782
Hanafi, S., & Arvanitis, R. (2015). Knowledge production in the Arab world: The impossible promise. In Knowledge
production in the Arab world: The impossible promise (Vol. 25). Routledge/Taylor Francis. https://doi.org/10.4324/
9781315669434
Hoekstra, R., Finch, S., Kiers, H. A. L., & Johnson, A. (2006). Probability as certainty: Dichotomous thinking and the
misuse of p values. Psychonomic Bulletin & Review,13(6), 1033–1037. https://doi.org/10.3758/BF03213921
Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E. J. (2014). Robust misinterpretation of confidence
intervals. Psychonomic Bulletin & Review,21(5), 1157–1164. https://doi.org/10.3758/s13423-013-0572-3
Ibrahim, B. (2021). Statistical methods used in Arabic journals of library and information science. In Scientometrics
(Vol. 126, Issue 5). Springer International Publishing. https://doi.org/10.1007/s11192-021-03913-2
Ioannidis, J. P. A. (2018). Meta-research: Why research on research matters. PLoS Biology,16(3), e2005468. https://doi.
org/10.1371/journal.pbio.2005468
Lakens, D. (2021). The practical alternative to the p value is the correctly used p value. Perspectives on Psychological
Science: a Journal of the Association for Psychological Science,16(3), 639–648. https://doi.org/10.1177/
1745691620958012
Lecoutre, M. P., Poitevineau, J., & Lecoutre, B. (2003). Even statisticians are not immune to misinterpretations of
null hypothesis significance tests. International Journal of Psychology,38(1), 37–45. https://doi.org/10.1080/
00207590244000250
COGENT EDUCATION 15
Lyu, X. K., Xu, Y., Zhao, X. F., Zuo, X. N., & Hu, C. P. (2020). Beyond psychology: Prevalence of P value and confidence
interval misinterpretation across different fields. Journal of Pacific Rim Psychology,14, e6. https://doi.org/10.1017/
prp.2019.28
Mansour, E. (2016). Arab authors’perceptions about the scholarly publishing and refereeing system used in
Emerald’s library and information science journals. New Library World,117(7/8), 414–439. https://doi.org/10.1108/
NLW-01-2016-0007
Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science,
34(2), 103–115. https://doi.org/10.1086/288135
Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific Utopia: II. Restructuring incentives and practices to promote
truth over publishability. Perspectives on Psychological Science: a Journal of the Association for Psychological
Science,7(6), 615–631. https://doi.org/10.1177/1745691612459058
Nuzzo, R. (2014). Statistical errors P values, the ‘gold standard’of statistical validity, are not as reliable as many
scientists assume. Nature,506(7487), 150–152. https://doi.org/10.1038/506150a
Shehata, A. M. K., & Elgllab, M. F. M. (2018). Where Arab social science and humanities scholars choose to publish:
Falling in the predatory journals trap. Learned Publishing,31(3), 222–229. https://doi.org/10.1002/leap.1167
Smith, L., & Abouammoh, A. (2013). Higher education in Saudi Arabia achievements, challenges and opportunities.
Springer.
Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology,37(1), 1–2. https://doi.org/10.1080/
01973533.2015.1012991
Wei, R., Hu, Y., & Xiong, J. (2019). Effect size reporting practices in applied linguistics research: A study of one major
journal. Sage Open,9(2). https://doi.org/10.1177/2158244019850035
16 R. ALRUWAILI