ArticlePDF Available

Do we focus on process over outcome? Review of published studies in two prominent saudi journals

Taylor & Francis
Cogent Education
Authors:

Abstract and Figures

It is well-known that published science is judged by the judicious and careful use of methods employed when acquiring knowledge and understanding. However, since the beginning of the millennia, concerns over the validity and reliability of scientific research have grown substantially. These concerns were mainly related to statistical methods and interpretation of statistical inferences. This study attempts to investigate the statistical practices of published studies in two prominent Saudi journals. The purpose is to analyze the extension of the problem into a different geographical area. In addition, it attempts to look at the established editorial procedures utilized in the educational and linguistics sciences in Saudi Arabia. 114 published articles between 2017 and 2020 were analyzed by a 13-item checklist. The findings revealed an over-reliance on p-value and almost 67% of the published studies interpreted p-value as a proven population effect. In addition, researchers and editors relied heavily on a significant p-value as a merit for publication. Moreover, there was a low reporting rate in incorporating other statistical requirements such as CI and effect size. The implication of the findings highlights the importance of a more “verifying: how you found it” approach rather than over-reliance that significant p-value is publishable “what you found approach”.
This content is subject to copyright. Terms and conditions apply.
EDUCATIONAL ASSESSMENT & EVALUATION | RESEARCH ARTICLE
Do we focus on process over outcome? Review of published studies
in two prominent saudi journals
Ruwayshid Alruwaili
Language and Translation Dept., Arts and Education College, Northern Border University, Arar, Saudi Arabia
ABSTRACT
It is well-known that published science is judged by the judicious and careful use of
methods employed when acquiring knowledge and understanding. However, since
the beginning of the millennia, concerns over the validity and reliability of scientific
research have grown substantially. These concerns were mainly related to statistical
methods and interpretation of statistical inferences. This study attempts to investigate
the statistical practices of published studies in two prominent Saudi journals. The pur-
pose is to analyze the extension of the problem into a different geographical area. In
addition, it attempts to look at the established editorial procedures utilized in the
educational and linguistics sciences in Saudi Arabia. 114 published articles between
2017 and 2020 were analyzed by a 13-item checklist. The findings revealed an over-
reliance on p-value and almost 67% of the published studies interpreted p-value as a
proven population effect. In addition, researchers and editors relied heavily on a sig-
nificant p-value as a merit for publication. Moreover, there was a low reporting rate in
incorporating other statistical requirements such as CI and effect size. The implication
of the findings highlights the importance of a more verifying: how you found it
approach rather than over-reliance that significant p-value is publishable what you
found approach.
ARTICLE HISTORY
Received 13 December 2023
Revised 11 December 2024
Accepted 12 December 2024
KEYWORDS
p-value; statistical practices;
replication crisis; Saudi
journals; journal editorial
requirements
SUBJECTS
Psychological Methods &
Statistics; Publishing
Industry; Publishing
Introduction
It is well known that the value of published studies resides in the reliable knowledge that they can pro-
duce. The question then arises, how reliable are these results? One way to answer the question is to
look at the methods they employed and how rigorous they are? And what kind of interpretations and
inferences are gained from them. However, it is not always a straightforward issue. Published studies are
different in nature, questions, and methods. It is therefore really hard to critically assess them with one
unique scientific method (Devezer et al., 2021; Gigerenzer, 2004; Ioannidis, 2018; Meehl, 1967).
Research assessment hinges on two fundamental principles: reliability and validity. Validity refers to
the degree of measurement precision, while reliability indicates the degree of measurement consistency.
These two principles are crucial for guaranteeing the trustworthiness of scientific findings. However,
while validity and reliability are the bedrock of research quality, the recent scholarly focus has shifted
toward the importance of reproducibility and reliability overshadowing other crucial aspects of research
assessments. In fact, determining the reliability of published studies has gained a remarkable interest in
recent years in the media and academia. With news of failure to replicate the results of prominent stud-
ies in social sciences, doubts and questions started to emerge that they are not as reliable as many sci-
entists assume. This has generated what is now called a reproducibility crisis(Amrhein et al., 2019;
Nuzzo, 2014). A recent study in Nature Human Behaviour attempted to replicate the findings in 21 social
studies published in prestigious journals Science and Nature (Camerer et al., 2018). The findings revealed
that they were unable to yield the same effect in more than one-third of the studies. The finding is
CONTACT Ruwayshid Alruwaili Ruwayshid.alruwaili@nbu.edu.sa Languages and Translation Dept., Humanities and Social Sciences
College, Northern Border University, Arar, Saudi Arabia
ß2024 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The terms on which this article has been
published allow the posting of the Accepted Manuscript in a repository by the author(s) or with their consent.
COGENT EDUCATION
2025, VOL. 12, NO. 1, 2443286
https://doi.org/10.1080/2331186X.2024.2443286
highly significant because if scientists were unable to reproduce the same results from prestigious jour-
nals, what about other not-highly ranked journals or non- peer-reviewed journals. This seems to show
how serious the problem is and its far-reaching implications for the larger scientific and academic com-
munities (Lyu et al., 2020).
Crucially, the problem is frequently linked to the misinterpretation and misuse of statistical techniques
and methods namely p-value significance testing (Gigerenzer, 2018; Greenwald et al., 1996; Halsey, 2019;
Lakens, 2021). This statistical technique is usually utilized to draw inferences from the provided data.
Recent and old studies conducted in many places revealed that psychologists and social scientists have
poor knowledge about interpreting p-value correctly (Badenes-Ribera et al., 2015; Haller & Krauss, 2002;
Hoekstra et al., 2014; Lecoutre et al., 2003). However, journal editors and scientists rely on statistical sig-
nificance to view whether a result crosses a hypothetical threshold of .05 as it has a realeffect.
Importantly, this has resulted and inflated in what is known as publication biaswhere significant
results find their way into peer-reviewed journals while non-significant results are not published.
Although the technique has been criticized heavily in the literature, recent calls are accumulating for the
entire concept of statistical significance to be abandoned or banned (Gigerenzer, 2018; Lakens, 2021;
Trafimow & Marks, 2015). Therefore, if we believe that misinterpretations of statistical techniques might
affect the quality of scientific research, then we need more scrutiny and examination of these techniques
instead of focusing merely on the outcome.
This study attempts to investigate the statistical practices of published studies in two prominent
Saudi journals. The purpose is to analyze the extension of the problem into a different geographical
area. In addition, it attempts to look at the established editorial procedures utilized in the educational
and linguistics sciences in Saudi Arabia.
Generally, the role of the journal editors and reviewers is to filter out poor submissions and to publish
ultimately the good ones (Edwards, 2022). The current study is, therefore, necessary to see how Saudi
journals fit within the context of reproducible and methodological reforms. By contrasting the existing
practices with the recommended best practices, the study will reveal whether there is sophistication in
adequate statistical reporting. It will also demonstrate how journal editors and reviewers act as
academic custodiansand whether they were able to implement and transmit these adequate and
rigorous reporting practices to authors.
Ultimately, the evidence generated by the findings is probably going to serve as a foundation for
greater accountability, diligence, and academic rigor. At the global level, it will be central to building
knowledge about the effectiveness of the methodological reforms outside the Western scholarly publica-
tion by illuminating what is existing and what ought to be.
Why published studies
The value of scientific expertise depends on making sure that knowledge published in journal studies is
useful, reliable, and relevant. The scientific expertise thrives on debate and discussion. It also thrives on
people verifying and debating views using available evidence. Given this significance, it is important to
understand the value published studies can bring to society. Published studies generate a collective out-
put of practices and empirical generalizations which may have a little or huge wide-ranging implications
to society. In addition, the research enterprise grows very fast; therefore, improving the efficiency of sci-
entific investigation can translate published results into valuable contributions and developments to
society.
Moreover, the decision-making process usually depends on scientific expertise. Scientific production is
no longer limited to a number of research organizations, but also policymakers have more direct access
to what is published and it supports policymaking at all stages of the policy cycle. Crucially, policy-
makers usually have this tendency to place more confidence in published research than anecdotal evi-
dence. The published knowledge is perceived as trueor a potent source of finality(Nosek et al.,
2012). However, placing more faith requires a more verifyingapproach to what is published.
Examining the reliability of the methods used in these published studies can reveal the nature of accu-
mulated knowledge. More importantly, the success of effective decision-making depends on more reli-
able evidence-method, and good science.
2 R. ALRUWAILI
Moreover, academic promotion is linked to the number of publications in the Saudi academic context
and elsewhere in the Middle East. Academics and scientists are rewarded and promoted based on what
they publish. In other words, publishing is essential for their career advancement (Shehata & Elgllab,
2018). However, the fact that publishing is essential to success is just a fact of the trade(Nosek et al.,
2012). Yet, it is very crucial to see whether publishing more studies is rewarded at the expense of rigor-
ous statistical standards. An academic journal gains its reputation for excellence in publishing high-
quality articles (Mansour, 2016). Moreover, journal editors are the gatekeepersof scholarly published
science. The review system guarantees the quality, integrity, and reproducibility of any scholarly article.
Given the fact that academics in Saudi Arabia need to thrive professionally, what are the requirements
journal editors seem to request to assure the quality of the research? The exact requirements may vary
from one journal to another, but the main goal remains to improve the scholarly article. Based on these
premises, the study aims to investigate Saudi editorial requirements in deciding and scrutinizing more
aspects of the accepted papers including methods and statistics. Particularly, do journal editors rely only
on the use of significance testing for their approval or do they augment their decision by requiring add-
itional measures of uncertainty to the analysis of the obtained inferences?
Literature review
Since the beginning of the millennia, concerns over the validity and reliability of scientific research have
grown substantially (Finch et al., 2004; Hoekstra et al., 2006). These concerns were related to statistical
methods and interpretation of statistical inferences or even inspecting questionable research practices
(QRP). It is therefore significant to conduct reviews and meta-analysis to be in a better position to pro-
duce generalizable findings. For example, Finch et al. (2001) reviewed 150 published articles in the
Journal of Applied Psychology(JAP) from 1940 to 1999. Their focus was to review the misconceptions
about significance testing practices utilized in these 150 studies. Moreover, they investigated the report-
ing of p-value according to the recommendations of the American Psychological Associations (APA)
Task Force on Statistical Inference (TFSI). Unfortunately, what they observed was little progress or influ-
ence of TFSI recommendations had occurred in statistical reporting practices in JAP. In other words,
although editorial requirements were instantiated, they did not result in improved statistical practices.
Practices such as reporting power estimates or confidence intervals were rarely utilized.
Moreover, Gigerenzer (2004) desired to examine the impact of editorial requirements instantiated by
Geoffrey Loftus when he was appointed as the editor of the Memory and Cognition journal. Loftus
requested to authors to submit their papers with descriptive statistics and augment their analysis with
confidence interval figures instead of relying only on p-values. In other words, he wanted them to free
scholars from dichotomous yes/no decisions and drag them away from p-values reliance. What
Gigerenzer (2004) reported was a 32% decrease in over-reliance on p-values during his editorship. In
addition, Loftus observed a reluctance and stubbornness from authors to abandon p-values and
embrace the more uncertainty of an estimate such confidence intervals (Gigerenzer, 2004,2018).
Similarly, Hoekstra et al. (2006) reviewed articles submitted before and after the publication of the
fifth edition of the APA Publication Manual. In fact, the fifth manual endorsed the recommendations of
TFSI to use interval estimates and effect sizes in the analysis. Hoekstra et al. (2006) attempted to review
the influence of these changes on statistical practices whether they are reported as prescribed by the
APA manual. They analyzed 259 scholarly reports published in Psychonomic Bulletin & Review. All these
reports utilized significance testing as the method to analyze the data. A 13-item checklist was used to
evaluate statistical practices and determine whether they are reported as prescribed by the manual.
Unfortunately, the overall finding was that little change was observed in researchersbehavior. In other
words, editorial standards were not consistently followed and over-reliance on the null ritualwas
observed at a large scale.
Crucially, these studies show that reforms are available yet slow. There have been calls to push these
reforms further and abandon the use of statistical significance testing practices (Amrhein et al., 2019;
Nuzzo, 2014) and advocate better statistical practices (Lakens, 2021). However, researchers continue to
demonstrate consistent favoritism of statistical practices that utilize the null ritual(Gigerenzer, 2004).
This is especially worrying in light of the reproducibilityconcerns (Gigerenzer, 2018; Hoekstra et al.,
COGENT EDUCATION 3
2006; Nosek et al., 2012). To what extent the problem exists in the Saudi or Arab context is unknown.
Ibrahim (2021)provided an overview survey of statistical methods used in Library and Information
Science journals in Arabic literature. He calculated the type of statistical methods such as descriptive or
inferential used in the published studies. However, he did not show how these tests were interpreted or
reported. It is therefore significant to see how statistical practices are reported in Arabic journals in gen-
eral and in the Saudi context in particular.
Simply put, there is a scarcity of studies targeting other geographical areas in terms of methodo-
logical reforms in scholarly publications or gatekeeping processes. Most of what being currently dis-
cussed is related to Western academia. The need to examine other contexts is highly valuable to
illuminate the scale of the crisis in other parallel contexts. As seen in the literature, there is an uneven
level of awareness of issues related to methodological reforms and reproducibility across fields and con-
texts at the global level (Amrhein et al., 2019; Nuzzo, 2014). Thus, ongoing discussions about improving
research practices, data-sharing policies, and the adoption of best practices can bridge this gap of
awareness.
Moreover, in Saudi Arabia, journal editors and reviewers are set up by the university boards to regu-
late research and scholarly publication. The study will describe (later in the discussion) the major issues
and concerns posed by this trust and peer-review process in transmitting the best practices and the
merits for publication to authors. In other words, the study will expand the interpretive power of the
current crisis to examine other culture factors. This is of great significance to include when initiating
culture-conscious practices in relation to reproducibility, methodological advances, and open science in
developing and non-Western countries. In addition, another contribution is in promoting methodological
rigor in statistical reporting. By identifying common errors and flaws in statistical reporting practices,
such as failure to report effect sizes and confidence intervals, the findings provide guidance for research-
ers as well as editors and reviewers to improve the quality of their statistical reporting. Inadequate
methods and incomplete reporting can lead to avoidable research waste which can be boosted by
minor methodological adjustments and adherence to best reporting practices. These measures are easier
to implement and cost-effective to be utilized by journal editors and reviewers.
The findings will also show the importance of addressing publication bias and over-reliance on p-
value results for merit of publication. This will provide a more accurate representation of the available
evidence and can mitigate the impact of publication bias on the overall body of literature. Ultimately,
the study will promote the credibility and reliability of research findings in social science fields in general
and in the context of Saudi Arabia in particular.
The objectives of the study
The key objectives of the study are:
1. To identify the editorial requirements in relation to statistical practices in two prominent Saudi jour-
nals. Do they require alternatives beyond p-value?
2. To measure reporting statistical practices in published scholarly articles in two prominent Saudi
journals.
3. To determine whether the editorial requirements or reporting practices vary across the years or
journal.
Widespread concerns about the reliability and rigor of published research have centered around the
misuse and misinterpretation of statistical practices and reporting (Gigerenzer, 2004; Trafimow & Marks,
2015). To attain scientific rigor and research soundness, statistical practices have to receive formal scru-
tiny. Recently, Lakens (2021, p. 645) stated If we really consider the misinterpretation of p values to be
one of the more serious problems affecting the quality of scientific research, we need to seriously reflect on
whether we have done enough to prevent misunderstandings.
The emphasis and novelty of the current study is to examine the extension of this formal line of
inquiry into a new geographical area. Most of the published studies and discussions about
reproducibilityand statistical practicesare conducted in Western academia (Finch et al., 2004;
4 R. ALRUWAILI
Gigerenzer, 2018; Nosek et al., 2012). Moreover, rigorous statistical practices and procedures might result
in solid conclusions that scientific progress requires or high-quality evidence-based policies are based on
in Saudi Arabia.
Research methodology
The study examined statistical practices and procedures used in two prominent Saudi journals. The two
journals publish studies in relation to educational and linguistic research. In addition, they are not lim-
ited to Saudi Arabia, but also publish studies from other Gulf and Arab countries in both languages
Arabic and English. There are a number of reasons behind selecting these two journals that are impor-
tant to Saudi academia. First of all, universities in Saudi Arabia are classified mainly into two categories:
old and newly established universities. The old universities have a bigger budget compared to newly
established universities. Some of this budget is allocated to research and scholarly publication including
university journals. In addition, they have an adequate number of experienced faculty who are part of
the review process unlike the newly established ones who are under-staffed in order to operate effect-
ively. Moreover, these two journals have an extensive history of scientific publication in Saudi Arabia. So,
they have a history of processing and publishing intellectual knowledge. Therefore, studies published in
these two journals are accepted in academic promotions and considered rigorousin the Saudi context.
Furthermore, the selection of these journals based on legacy and importance to Saudi academia reflects
a commitment to understanding the local academic context. Although these two journals are not inter-
nationally recognized or indexed in databases such as Scopus or WoS, this might reveal valuable insights
into the unique academic environment in Saudi Arabia, thereby providing a more comprehensive view
of academic output in the country. While the lack of indexing in internationally recognized databases is
a valid concern, it should not neglect the importance of these journals in shaping the academic practi-
ces and emerging trends within Saudi Arabia. Recognizing the value of these journals contributes to a
more nuanced understanding of the countrys academic environment and improves the research that is
both impactful and meaningful.
To show how these two journals assure the academic community and the general public that pro-
duced knowledge is being intellectually scrutinized, the current study is going to zoom into their editor-
ial requirements to achieve reliability and validity of their published findings. It will evaluate how
statistical practices are reported and utilized. The creation of knowledge is based upon the use of statis-
tics and its elaboration on future knowledge. Statistical reporting and its interpretation need careful
deliberation. Evaluating the statistical practices will reveal the approach to the analysis of the data and
how to draw results. The evaluation study will expose then the prevailing practices such as excessive or
misuse of some tools (if existent). Crucially, the lack of rigor in scientific research is often blamed on
statistical practices such as p-value (Gigerenzer, 2018; Halsey, 2019; Hoekstra et al., 2014). Thus, the ana-
lysis will provide a straightforward picture of how academic custodianscarry out the gatekeeping
processof statistical practices in Saudi Arabia regardless of other logistic factors such as lack of
reviewers or editors.
Importantly, the peer-review process in both journals is in some way different from the western type.
The editors and reviewers are selected by the university boards based on their experience and academic
rank to regulate the scholarly publication in these journals. The editorial board and the reviewers are
not necessarily from the same university, they can be from other universities or other countries.
However, they are nominated and approved by the university boards. The editorial boards are then
charged with accepting, filtering, and ultimately publishing research studies. Apart from obvious meth-
odological errors, they are also trusted to justify the choice of topics, to transmit the best methodo-
logical practices to authors, and to ensure that published knowledge is adequately discussed. With the
primary goals of scholarly publications, they have also the responsibility to set the degree of quality in
an underrepresented context and to promote the verifyingapproach to the analysis of data instead of
what you found.
The first step in the peer-review process is that authors submit their articles via the website or the
email of the journal. Once received by the journal editors, reviewers are assigned to scrutinize the sub-
mission based on their area of expertise. After a careful examination, they hand in their decision with
COGENT EDUCATION 5
the critical comments. The editors send back the decision to the authors. Critically, the process is guided
by the editorial boards and the refereesreports in either acceptance or rejection of the submission.
The journals
The two journals are 1- journal of educational sciencespublished by King Saud University (KSU); 2-
journal of educational sciencespublished by Imam Mohammed Ibn Saud Islamic University (IMSIU).
These two universities are among the oldest universities in Saudi Arabia. KSU was founded in 1957 and
IMSIU was established in 1974 (Smith & Abouammoh, 2013). KSU publishes a number of scientific jour-
nals ranging from physics to education and linguistics. Due its nature, IMSIU mainly publishes journals
related to Islamic culture, Arabic language, and Education and Linguistics. The choice reflects the signifi-
cance of these two journals in promoting science in the Saudi context. Moreover, Saudi faculty favors
publishing in these selective journals locally more than in international journals (Hanafi, 2011 for more
discussion).
KSU journal
The journal is issued by KSU. On its website (https://jes.ksu.edu.sa/ar), it aims at publishing scholarly
articles that are original and pioneering in education and other related fields. The journal has gone
under a number of name changes. The first issue was published in 1977 under the name Studies, then
the title was changed to Educational Studiesin 1984. In 1992, it was named Educational Sciences and
Islamic Studies, then it was divided into two journals: Educational Sciencesand Islamic Studiesin
2012. From 2013, it has been issued under the name Journal of Educational Sciences. Its latest version
is indexed in international databases such as EBSCO and ResearchBib. It also follows the APA 6
th
ED style
in publication and referencing.
IMSIU journal
The journal is issued by IMSIU
1
. On its first issue, its vision states that it aims to produce, publish, and
apply knowledge. The journal is branched out from the older journal humanity and social sciences jour-
nal. The first issue was published in 2015. It follows the APA style in publication and referencing as
well.
Inclusion criteria
The focus is to include and evaluate the latest issues from 2017 to 2020. This would allow us to obtain
a more general overview of published studiesquality in both journals, and empirically evaluate the
reporting standards and statistical interpretations. A span of 3 years was decided upon for several rea-
sons. The first reason is to include only the latest issues after the period of name changes, so there is a
period of stability in editorial procedures. Moreover, choosing the latest issues is a better reflection of
reproducibilitylevels of awareness in editorial procedures. Publications over a current period of time
should be sufficient for studies of an exploratory nature such as this study to detect any systematic
changes in statistical procedures reporting across the time. The goal is also to observe any changes
aligned with recent calls for methodological reforms in Western academia. Including older studies might
overshadow this purpose, so conclusions drawn can be both current and robust without introducing sig-
nificant bias. Therefore, to ascertain that the implications and repercussions of the reproducibilitycrisis
and discussions about better statistical practiceshave been heard and propagated in the social media
and scientific news, only the latest issues were selected. In addition, choosing issues from a specific
period shows somehow a unity of purpose in the goals and topics in these journals. Thus, the decision
is to include and evaluate only the latest issues from the two journals and see how careful editorial pro-
cedures might reflect the careful use of statistical techniques.
6 R. ALRUWAILI
The checklist tool
A 13-item checklist was adopted from Hoekstra et al. (2006) to record the ways how statistical signifi-
cance is reported. The list was created to assess the reporting requirements and methodological quality
of research articles in the field of psychology. Each item in the list serves as a guideline for evaluating
specific aspects of the article, ensuring that essential elements are sufficiently addressed and reported.
In fact, reviewers and editors can critically evaluate the published studies and identify areas of improve-
ment in reporting standards and methodology. By systematically adopting these items, they can assess
the methodological rigor of the published articles, hence promoting validity and reliability. Reliability is
promoted when procedures and statistical analyses are clearly reported, hence can be replicated by
others. Validity is also strengthened when statistical analyses are appropriated applied and correctly
interpreted. In fact, the future strength of the published research depends on making sure that the pro-
duced knowledge is useful and reliable. The checklist consists of 5 categories detailed below and each
item was assigned 1if occurrence found or 0otherwise.
Reporting statistical significance as certainty
Inferring that statistically significant result means with certainty that the effect is present in the popula-
tion is statistically incorrect. Therefore, Arabic statements or words in the conclusion or the findings sec-
tions such as the findings confirmed, the findings validated, confirmed"،،
" are taken as errors of this type. However, statements or words such as "
، "the findings showed, the findings demonstratedare considered not errors of
this type. In addition, ambiguous statements such as "
"The control group scored higher than the experimental groupare not considered errors
of this type.
Reporting there is no effect or a negligibly small effect
Interpreting statistically non-significant results as the absence of the effect in the population is consid-
ered an error of this type. Statements and phrases such as " ، "no evidence, no
effectare taken evidence of this error. Similarly, interpreting a negligibly small effect is also considered
type of this error without augmenting the result with a CI or an effect size.
Reporting exact or relative p-value
This category concerns with whether p-value is reported as an exact value or a relative one in the pub-
lished studies. According to the APA style manual, p-value should be reported as an exact value, and
any violation of this requirement is considered as an error of this type regardless of the outcome (signifi-
cant/non-significant).
Reporting CI
This category deals with the reporting CI. According to APA guidelines, CI should be reported alongside
the p-value to provide a measure of uncertainty to the result, otherwise, it is considered a violation of
the manual guidelines. Thus, the absence of reporting CI numerically or visually is considered an error of
this type.
Reporting effect size
p-value doesnt provide alone information about the magnitude of an effect in the population.
Therefore, an effect size should be reported to draw conclusions from the sample about the size of the
effect. Thus, absence of reporting an effect size is considered an error of this type.
Procedures and coding
The sampling frame for this study was based on all research articles published between 2017 and 2020
from the two journals. A comprehensive review of the published research articles led to the identifica-
tion of a total 165 articles excluding book reviews. These articles were downloaded and further classified
based on their use of significance testing: (a) articles that utilized p-value in the analysis, (b) articles
COGENT EDUCATION 7
without p-value. The articles with p-value were coded across the five previously mentioned categories.
All articles were coded by the author and to ensure consistent application of the checklist, 80% were
cross-coded by a second researcher. The intercoder agreement rate was 92%.
Findings
What are the editorial requirements in the two journals? Do they go beyond p-value?
The main concern of the article is to look at the statistical requirements in the two journals as a micro-
scopic lens for the review process and whether the published knowledge is intellectually scrutinized.
The use of statistical methods has an enormous value in scholarly publication. Papers are judged by the
careful utilization of investigation methods and their interpretations. Thus, this study attempted to pro-
vide a description for the existing and widespread use of statistical practices in two Saudi journals. As
shown in Table 1, the findings show that more than half of the papers published in these two journals
relied on and reported null hypothesis tests (namely p-value) in their analyses and inferences. The num-
bers show that both journals are almost equal in their reliance on p-value in their reporting statistical
practices (69% and 68% respectively). To show if there is a variation in practice between the two jour-
nals, there was no significant difference for university journal, t(22) ¼.665, p¼.51, with a 95% confi-
dence interval of 0.97 to 1.9.
These numbers seem to indicate that p-value is the commonly used method to report the findings.
The questions arise then is it used appropriately as recommended? And what kind of interpretations are
attached to it? As shown in Table 2, the findings show that there are occurrences of p-value misinterpre-
tations. Significant p-value is more likely to be interpreted to imply with certaintythat the effect is pre-
sent. Z-test for Difference of Two independent Proportions was conducted to determine if there is a
significant difference in the statistical practice between the two journals. The findings reveal that there
was no difference in interpreting p-value with certaintyreading, z¼0.13, p¼.89, u¼.012, with a 95%
confidence interval of -0.18 to .16. Similarly, there are occurrences of non-significant results interpreted
as the absence of the effect. Again, Z-test scores indicate that there was no significant difference in
practice between the two journals, z¼0.68, p¼.49, u¼.063, with a 95% confidence interval of 0.24
to .116.
Two main conclusions can be drawn from the above data in Table 2 as far as the interpretation of p-
value is concerned: First, it is commonly interpreted as a dichotomous decision and erroneously equated
with certainty. The second is that this statistical interpretation uniformly does not vary across journal
publications. Note that this binary interpretation claims certainty on the basis of a significant result
(67%) which indicates that editorial procedures might endorse this binary thinking.
The observed findings due to p-value interpretations show that how the p-value is reported needs a
further investigation. APA guidelines require exact p-values reporting. Table 3 presents how the p-value
is reported in significant and nonsignificant results.
Table 3 shows that a large proportion of the cases are reported with exact p-values in significant and
nonsignificant findings. The median shows that the trend is most likely to report p-value. However, there
are some cases of relative or no p-value reporting which cast some doubts that the editorial procedures
are not consistently followed with regard to exact p-value reporting. Although previous studies
(Hoekstra et al., 2006) reported clear differences between reporting p-value practices when the outcome
is significant or nonsignificant, the density plots in Figure 1 demonstrate that no clear differences found
in the way which significant and nonsignificant results were reported. An exact McNemars test shows
that there was no statistically significant difference in the proportion of significant/nonsignificant p-value
Table 1. No of articles reporting with p-value tabulated with university journal in selected years.
Total no of articles Articles & p-value
ImamJ SaudJ ImamJ SaudJ
Valid 11 13 11 13
Sum 80.000 85.000 55.000 59.000
8 R. ALRUWAILI
reporting, X
2
(1)¼.862, p¼.353. The p-value is more likely to be reported regardless of the outcome sig-
nificant or nonsignificant.
To see whether statistical data and conclusions are reported on the basis of statistical significance
only, or they are corroborated and augmented by other statistical tools. In fact, drawing conclusions
should not be decided on whether a result is crossing a statistical threshold (Chow, 1998). The descrip-
tive statistics in Table 4 demonstrate other statistical practices whether utilized or not in the reporting.
As it can be noted, there is a total lack of CI reporting in the published studies. Failure to report or pre-
sent CI alongside the p-value is opposed to APA guidelines. Similarly, visual representations are almost
non-existent. These measures of uncertainty add more meanings to the findings and give another com-
plexion to the conclusions.
In relation to effect size, it is noteworthy that reporting effect size is critically essential rather than
relying only on the raw mean differences. Table 4 shows the unstandardized and standardized effect
size reporting practices. As it can be observed that unstandardized effect size such as giving the means
was unanimously reported. However, the reporting of standardized of effect size was relatively small
(24%, 28%) and indicated inconsistency. This finding indicates that while effect size in its broader mean-
ing is totally reported, reporting standardized effect size was relatively limited. Failure to report effect
size along with p-value is also observed in the literature (Hoekstra et al., 2006; Wei et al., 2019).
Table 2. Interpretations of p-value tabulated with University Journal.
Uni journal N Mean Median Sum SD Z-Test p-value Phi-Coefficient
Significance as Certainty ImamJ 55 0.6727 1 37 0.474 .13 .89 .012
SaudJ 59 0.6610 1 39 0.477
Insignificant as No effect ImamJ 55 0.4182 0 23 0.498 .68 .49 .063
SaudJ 59 0.3559 0 21 0.483
No or negligible effect ImamJ 55 0.0000 0 0 0.000
SaudJ 59 0.0169 0 1 0.130
Table 3. p-value reporting practice in significant and nonsignificant results.
Uni Journal N Median Sum SD
Significance with exact p-value ImamJ 55 1 32 0.498
SaudJ 59 1 37 0.488
Significance with relative p-value ImamJ 55 0 11 0.404
SaudJ 59 0 10 0.378
Significance with NO p-value ImamJ 55 0 6 0.315
SaudJ 59 0 3 0.222
Insignificance with exact p-value ImamJ 55 1 32 0.498
SaudJ 59 1 42 0.457
Insignificance with relative p-value ImamJ 55 0 4 0.262
SaudJ 59 0 0 0.000
Insignificance with NO p-value ImamJ 55 0 6 0.315
SaudJ 59 0 1 0.130
Figure 1. Density distribution for p-value reporting practice in significant and nonsignificant.
COGENT EDUCATION 9
Do the editorial requirements vary across the years?
It is important to see if the journal editors and reviewers were able to transmit the best practices to the
authors. The editors and the reviewers of the journals are trusted to promote the reliability of their
investigation methods. As mentioned previously, part of the current methodological reforms is to exer-
cise additional scrutiny to the methods employed. With much more attention and news about the
reproducibility crisis(Nuzzo, 2014), it is significant to see how editors and reviewers as well respond to
these calls in an underrepresented context in international and local journals. To see whether the report-
ing practices vary across the year, a chi-square test is conducted to see whether there is a correlation
between interpreting the p-value as certaintyand the year of publication. The findings reveal no sig-
nificant level of association between the reporting rate and the year of publication, X
2
(6) ¼3.17, p ¼
.787, and CramersVreveal a small level of association between these two variables, CramersV¼0.167.
The descriptive statistics in Figure 2 reveal that reporting rates were not significantly higher than the
previous year in both journals.
Similarly, to examine if there is a difference in the exact reporting rate for p-value across the years, a
chi-square test is conducted, X
2
(6) ¼10.5, p¼.107, and CramersVreveals a small-to-medium level of
association between these two variables, CramersV¼0.303. These findings indicate that the strength of
association between reporting practices and years of publication lies in a negligible level of association.
In other words, reporting practices dont vary across the years and editorial requirements and proce-
dures dont change.
Importantly, the descriptive numbers in Figure 3 reveal no observed trend in reporting the standar-
dized effect size across the years rather than inconsistent individual choice.
Crucially, these findings are indicative of a troubling situation. Editorial procedures and reporting
practices seem to heavily rely on reporting p-value without incorporating other requirements. The low
reporting rates reflect inconsistency and individual choices rather than editorial requirements. In other
words, studies seem to be judged and evaluated on the basis of research outputs rather than how did
you arrive at this conclusion?
Table 4. Other non-p-value statistical practices reporting.
Uni Journal N Median Sum SD
Standardized effect size ImamJ 55 0 13 0.429
SaudJ 59 0 17 0.457
Measure of effect size (mean) ImamJ 55 1 54 0.135
SaudJ 59 1 59 0.000
Error bars ImamJ 55 0 2 0.189
SaudJ 59 0 2 0.183
Confidence intervals ImamJ 55 1 0 0.000
SaudJ 59 1 0 0.000
Figure 2. Reporting rate for p-value as a certainty across the years.
10 R. ALRUWAILI
Discussion
The study aimed at evaluating the statistical editorial requirements in two Saudi educational and linguis-
tic journals. There has been scarcity in studies that scan and review statistical reporting practices in
social science fields such as education and linguistics(see Hanafi & Arvanitis, 2015; Shehata & Elgllab,
2018). Therefore, to understand how editors manage to accept and publish studies in these journals, it
is fruitful to explore some basics of statistical requirements. Most importantly, good science is defined
by the judicious use and utilization of statistical techniques more than by what particular statistical tests
are used. Consequently, this requires more verifyingapproach than over-reliance and mistaken belief
that only significant p-value is publishable (Nosek et al., 2012). In other words, the focus should be on
the processthat leads to these findings rather than what are the findings?. With the increasing
requirements for methodological reforms in research outputs comes the need to ensure that published
knowledge and knowledge creation is intellectually deliberated. Thus, a clear evaluation study is the
starting point for any effective discussion or impact.
To bridge this gap and address the recent calls for more robust statistical reporting practices, the
study was restricted to two journals perceived to be prestigious and published by two old established
institutions in Saudi Arabia (Hanafi & Arvanitis, 2015; Mansour, 2016). The evaluation of 165 published
articles revealed that statistical significance testing practice is overwhelmingly used, with almost 70% of
these articles utilizing p-value in their analysis and interpretation.
Although this study is exploratory in nature, it is innovative in targeting the statistical requirements
as a window of academic rigor and professional practice in local journals (Hanafi, 2011). The first obser-
vation is that almost 67% of the published studies interpreted p-value as a proven population effect.
This indicates a binary decision on the basis of a significant result. In other words, authors and editors
(including reviewers) rely heavily on a significant p-value as a merit for publication. Similarly, taking the
percentages together interpreting the p-value as a proven effect, and rejecting the insignificant p-value
indirectly imply errors are made in interpreting the outcome of a significant or insignificant p-value.
Misinterpretations of p-value are erroneously believed to be publishable and correct (Haller & Krauss,
2002; Hoekstra et al., 2006). In addition, both journals uniformly prefer to utilize p-value as a dichotom-
ous decision indicating a predominant behavior of equating p-value with certainty and a real effect in
local journals. This conclusion adds to the previous studies in the literature where misinterpretations of
NHST are commonly observed (Badenes-Ribera et al., 2015; Gigerenzer, 2004,2018; Haller & Krauss,
2002; Lecoutre et al., 2003; Lyu et al., 2020). Crucially, the practice of only publishing significant findings
is a well-documented phenomenon known as publication bias(Nosek et al., 2012). Journals tend to
favor studies with statistically significant results, leading to skewed and misleading descriptions of the
evidence in the literature. This bias distorts the scientific record by over-representing positive findings
and under-representing negative findings. Thus, the tendency to equate non-significant results with
Figure 3. Reporting practice of standardized effect size across the years.
COGENT EDUCATION 11
"no effect" poses significant and worrying challenges. It is more likely that authors adjust their submis-
sions to align with perceived editorial preferences in these journals. Therefore, reporting is likely to be
based on perceived journal preferences rather than on scientific merit.
Moreover, investigating the reporting of p-value in the published studies revealed that there were a
number of inconsistent practices. Despite APA guidelines prescribe that p-value should be reported with
exact values, the review found few cases of relative p-value reporting and a small number of no p-value
(see Table 3). These cases indicate that APA requirements were not consistently followed. However, it is
not possible to deduct the direction of p-value reporting practice unlike previous studies (Hoekstra
et al., 2006). Editors and researchers were more likely to report the exact p-value regardless of the
outcome whether significant or non-significant.
Given the fact that the p-value fails to provide the type of information the editors want to obtain,
what kind of other alternative statistical practices do journal editors request alongside the p-value.
Recommendations usually include reporting effect size, CI, and graphical displays to communicate the
distribution of the data (Finch et al., 2004; Gigerenzer, 2004; Hoekstra et al., 2014). The evaluation found
that alternative statistical practices seem to be almost absent in statistical reporting in these journals
(see Table 4). Although these alternatives are highly recommended by the APA manual, there seems to
be no continuous vigilance from the side of journal editors. The small number of effect size reporting
indicates an individual vigilance from the side of the researchers rather than an established editorial
requirement. Crucially, these alternatives are not immune to mis-interpretations (Greenwald et al., 1996;
Hoekstra et al., 2014), but what is highlighted is the need for more transparency when reporting the
results as well as acknowledging the limitations of the p-value. The argument is not to completely aban-
don the p-value. Using p-value is a good thing (Lakens, 2021), but even good things can be pushed or
utilized too far beyond their proper limits. There must be a room for other statistical practices to better
know and understand the results beside p-value.
Generally speaking, the findings pose critical challenges to the way knowledge is published in the
two journals. Apart from ensuring the smooth running of the review process, the journal editors have
the responsibility to increase and endorse the quality of creating knowledge. The findings provide an
unpleasant picture of how knowledge is intellectually processed. The implications drawn from the find-
ings that papers are minimally scrutinized in terms of statistical tools. The widespread, misuse, and the
absence of some other tools provide a glimpse into the dynamics of the review process in these two
journals. It seems that research outputs are simply assessed on the basis of what did you find?rather
than how did you find it?This casts doubts and challenges on the authority of journals as centers of
knowledge production in Saudi Arabia.
If journal editors dont request or force these recommendations, no reaction is going to occur and
researchers will continue submitting scholarly articles without implementing good reporting practices.
Journal editors are the gatekeepersto initiate and cause a reaction and change. Therefore, they have
an important role to play in implementing and encouraging researchers to use and utilize these alterna-
tives in their reporting practices (Finch et al., 2004; Gigerenzer, 2004). They are supposed to transmit the
recommended best practices to authors. This can be directly achieved through requesting alternatives or
indirectly through reviewersreports. The case study of editor Geoffrey Loftus with his editorial guide-
lines indicates that the change was noticeable in shifting the reliance from p-value to other alternative
practice (Gigerenzer, 2004).
The burden is largely on journal editors and reviewers in Saudi Arabia to initiate and advocate a
methodological reform (Gigerenzer, 2004). Papers should not be solely accepted on the basis of a signifi-
cant p-value as a merit for publication. There is a large international body of discussion and experience
which can be put to good use in mitigating if not solving these problems (Camerer et al., 2018;
Ioannidis, 2018). However, these recommendations and guidelines have to be translated into concrete
actions locally and regionally. Thus, Saudi journal editors should shift their focus to increased statistical
sophistication and promote and accommodate a set of fruitful beliefs across a range of areas and disci-
ples. However, there is no immunity in scientific research, but editors can make sure that the system
favors and implements good practices to improve how research is performed, evaluated, communicated,
and rewarded.
12 R. ALRUWAILI
Editors can request authors who submit only p-value articles to routinely include (1) figures, error
bars, and CIs to evaluate their findings. (2) effect size reporting is also needed for both statistically and
non-statistically significant results. (3) APA guidelines in relation to p-value exact reporting should be
consistently followed. (4) Authors need to incorporate p-value and effect-size interpretations in their
reporting as well as how these numbers should be interpreted within the research design and questions.
(5) the p-value should not be interpreted as a dichotomous decision or quantifying the improbability of
the null hypothesis. (6) Authors need to consider that a significant p-value doesnt necessarily imply a
true effect. (7) reporting p-value should not be an end itself, but what kind of information to convey. (8)
The uncertainty has to be reflected as well in the academic writing. Words such as proveand show
should not be used, but a more accurate statement instead. The editors and reviewers of these two jour-
nals are strongly recommended to review and see recent and older calls for methodological reforms and
consider the use of sophisticated practices as a driving force for change and enhancing quality (Finch
et al., 2004; Gigerenzer, 2004,2018; Greenwald et al., 1996; Lakens, 2021; Meehl, 1967).
Importantly, these findings are about promoting rigor, transparency, and practical considerations to pub-
lished studies in two Saudi journals. Publishing in these two journals is important for academic survival.
Therefore, this evaluation study aimed at uncovering the statistical practices and editorial requirements. The
focus was on the statistical practices because they seemed to be relevant and a better reflection of the
methodological practices. We believe that the previously mentioned 8 points provide guidelines for adop-
tion as well as establishing common ground for research quality in general. Particularly, reviewers and edi-
tors could benefit from the utilization of these findings by empowering more positive outcomes such as
increased quality, sophisticated statistical practice, transparency, and openness. Furthermore, instead of rely-
ing only on p-value as a merit for publications, these findings might provide the impetus for a more careful
consideration when accepting a study for publication. The findings also show that these 8 points represent
a new and superior way of analyzing and interpreting results in the social journals in KSA.
There are some suggestions and possible future paths for journal editors and researchers to enhance
statistical reporting procedures in order to support methodological reform in Saudi academic publishing.
Initially, it is important for journal editors and reviewers to set explicit rules for statistical reporting,
including the need for thorough explanations of statistical procedures. Encouraging researchers to pre-
register their experiments and use open scientific techniques, which can help address problems like
selective reporting, is another possible improvement. More crucially, reviewers can be requested to
assess how well submitted publicationsstatistical information is clear and complete which in return pro-
vides authors with feedback on areas for improvement.
In terms of generalizability, the results might not be representative of the whole journal editorial
requirements in Saudi Arabia, as it focuses only on the social sciences. It might be a different situation
with natural science journals. Future research needs to investigate whether these editorial practices are
ubiquitous across all fields. However, the results might be representative in the social sciences because
the study focuses on established journals edited by experienced scholars with a full professor rank. It
seems therefore highly unlikely that the situation to be a different picture with newly published journals
from newly established institutions. Moreover, the present study has provided evidence for dominant
editorial practices regardless of the university journal. Therefore, poking holes in the current practices is
often how progress gets made. However, editors need to evoke the desired change, otherwise these
practices will continue to exist. Reichenbach once stated If error is corrected whenever it is recognized,
the path of error is the path of truth.Future research directions should focus on the impact of enhancing
transparency and integrating complementary statistical metrics across various fields. Moreover, it is
potentially promising to explore the need to adequately train reviewers or require a certain level of stat-
istical literacy for those reviewing manuscripts. By doing so, the Saudi scientific community can ensure
that p-values and other statistical tools are used and reported in ways that promote better understand-
ing and improve professional practice.
Conclusion
The broad purpose of this review is to provide a characterization of current statistical editorial require-
ments in two Saudi journals. The theoretical significance of the paper stems from exploring how academic
COGENT EDUCATION 13
knowledge is shaped and endorsed by editorial practices. Analysis of editorial practices in relation to stat-
istical requirements was carried out, and to what extent editorial requirements appear to be inclusive and
accommodating rigorous methodological practices. The results indicated that editors seem to be unaware
of recent and ongoing methodological reforms in social sciences (Gigerenzer, 2004; Ioannidis, 2018). This
unawareness was reflected in what was required to submit and publish a scholarly article in local social
science journals (Amrhein et al., 2019;Nuzzo,2014). The focus seems to be on what are the findings
What did you findrather than How did you find it. Accordingly, the practical significance derives from
the likely extent that some scholarly articles are effectively publishable on the basis of a statistical result,
such as over-reliance on the p-value as a criterion for acceptability. A statistically significant p-value is
often treated as a benchmark for determining whether a studys findings are worthy of publication. Thus,
the article provides evidence of the inclusivity or exclusivity of knowledge production. The overemphasis
on p-values as a gatekeeping tool can create an exclusive knowledge production and methods. This trend
can have far-reaching implications for the quality and rigor of published research in Saudi Arabia.
Unfortunately, our evaluation didnt reveal any meaningful improvements over the scanned period.
Moreover, we revealed that articles seem to be commonly publishable on the basis of a p-value only,
and there was an overwhelming reliance on p-value statistics or a null ritual(Gigerenzer, 2004). In add-
ition, this reliance seemed to encourage binary thinking of p-value statistics. The results of the present
study agree with the findings of previous studies (Haller & Krauss, 2002; Hoekstra et al., 2006). In other
words, the results mirror those of Western research, this might indicate the problem is uniformly univer-
sal and requires collective efforts. They also demonstrate that the study adds to the field, by supporting
noticeable findings and illuminating the scale of the crisis in other parallel contexts. APA guidelines
were also not followed regularly and consistently at a large scale.
However, there is no single solution to the worrying situation, but Saudi editors need more courage and
nerves to advocate more methodological reforms and instill a culture of change and vigilance. Enhancing
the quality of journal publications requires targeted strategies across key aspects of scholarly work, includ-
ing statistical rigor, and overall research integrity. For actionable strategies, Saudi journal editors can pro-
pose pre-submission peer reviews for manuscript improvement, addressing statistical assumptions explicitly,
and making methodologies and data accessible (see Lakens, 2021). In addition, editors and reviewers should
actively encourage the submission of studies with null or inconclusive findings to reduce publication bias
and enhance the diversity of the research landscape. More importantly, research quality should be judged
on criteria such as methodological rigor, and theoretical contributions beyond only p-value (Nosek et al.,
2012). Also, they need to educate researchers on best practices and encourage ongoing reforms and train-
ing in statistical methods and use of advanced tools for data analysis to improve the quality of results.
Finally, editors are the cornerstone to cause favorable change, otherwise poor editorial requirements
will continue to exist. Therefore, to improve academic publishing and increase the publics trust in
research results, it is essentially necessary to demand and use proper and sophisticated statistical techni-
ques in scholarly submission and publishing. Saudi journal editors need to agree on a set of principles
and commitments for reforming the way research findings are evaluated and published.
Note
1. At this website https://units.imamu.edu.sa/deanships/SR/Units/Vice/Magazines/Pages/%D9%85%D8%AC%D9%
84%D8%A9-%D8%A7%D9%84%D8%B9%D9%84%D9%88%D9%85-%D8%A7%D9%84%D8%AA%D8%B1%D8%
A8%D9%88%D9%8A%D8%A9-.aspx.
Disclosure statement
No potential conflict of interest was reported by the author(s).
About the author
Ruwayshid Alruwaili is Assistant Professor and a former head of Applied Linguistics at Northern Border University,
and a UK Associate Fellow of the Higher Education Academy. His research interests include the acquisition of
14 R. ALRUWAILI
morphosyntactic features/applied linguistics in KSA, and research methods in SLA. He focuses on methodological
reforms to enhance rigor, transparency, and reliability across disciplines, with a keen interest in fostering better prac-
tices in academic and professional research. He is an entrepreneur and consultant in quality issues at HE
institutions.
References
Amrhein, V., Greenland, S., & McShane, B. (2019). Retire statistical significance. Nature,567(7748), 305307. https://
media.nature.com/original/magazine-assets/d41586-019-00857-9/d41586-019-00857-9.pdf https://doi.org/10.1038/
d41586-019-00857-9
Badenes-Ribera, L., Fr
ıas-Navarro, D., Monterde-I-Bort, H., & Pascual-Soler, M. (2015). Interpretation of the p value: A
national survey study in academic psychologists from spain. Psicothema,27(3), 290295. https://doi.org/10.7334/
psicothema2014.283
Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., Kirchler, M., Nave, G., Nosek, B. A.,
Pfeiffer, T., Altmejd, A., Buttrick, N., Chan, T., Chen, Y., Forsell, E., Gampa, A., Heikensten, E., Hummer, L., Imai, T.,
Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010
and 2015. Nature Human Behaviour,2(9), 637644. https://doi.org/10.1038/s41562-018-0399-z
Chow, S. L. (1998). Precis of statistical significance: Rationale, validity, and utility. The Behavioral and Brain Sciences,
21(2), 169194. https://doi.org/10.1017/s0140525x98001162
Devezer, B., Navarro, D. J., Vandekerckhove, J., & Ozge Buzbas, E. (2021). The case for formal methodology in scien-
tific reform. Royal Society Open Science,8(3), 200805. https://doi.org/10.1098/rsos.200805
Edwards, J. (2022). The journal editor as academic custodian. In A. K. Habibie Pejmanand Hultgren (Ed.), The inner
world of gatekeeping in scholarly publication (pp. 227244). Springer International Publishing. https://doi.org/10.
1007/978-3-031-06519-4_13
Finch, S., Cumming, G., & Thomason, N. (2001). Editors Note on the Colloquium on Effect Sizes: The roles of editors,
textbook authors, and the publication manual. Educational and Psychological Measurement,61(2), 181210. https://
doi.org/10.1177/00131640121971176
Finch, S., Cumming, G., Williams, J., Palmer, L. E. E., Griffith, E., Alders, C., Anderson, J., & Goodman, O. (2004). Reform
of statistical inference in psychology 313. Behavior Research Methods, Instruments, & Computers,36(2), 312324.
https://doi.org/10.3758/BF03195577
Gigerenzer, G. (2004). Mindless statistics. The Journal of Socio-Economics,33(5), 587606. https://doi.org/10.1016/j.
socec.2004.09.033
Gigerenzer, G. (2018). Statistical rituals: The replication delusion and how we got there. Advances in Methods and
Practices in Psychological Science,1(2), 198218. https://doi.org/10.1177/2515245918771329
Greenwald, A. G., Gonzalez, R., Harris, R. J., & Guthrie, D. (1996). Effect sizes and p values: What should be reported
and what should be replicated? Psychophysiology,33(2), 175183. https://doi.org/10.1111/j.1469-8986.1996.
tb02121.x
Haller, H., & Krauss, S. (2002). Misinterpretations of significance: A problem students share with their teachers?
MPR-Online,7(January), 120.
Halsey, L. G. (2019). The reign of the p-value is over: What alternative analyses could we employ to fill the power
vacuum? Biology Letters,15(5), 20190174. https://doi.org/10.1098/rsbl.2019.0174
Hanafi, S. (2011). University systems in the arab east: Publish globally and perish locally vs publish locally and perish
globally. Current Sociology,59(3), 291309. https://doi.org/10.1177/0011392111400782
Hanafi, S., & Arvanitis, R. (2015). Knowledge production in the Arab world: The impossible promise. In Knowledge
production in the Arab world: The impossible promise (Vol. 25). Routledge/Taylor Francis. https://doi.org/10.4324/
9781315669434
Hoekstra, R., Finch, S., Kiers, H. A. L., & Johnson, A. (2006). Probability as certainty: Dichotomous thinking and the
misuse of p values. Psychonomic Bulletin & Review,13(6), 10331037. https://doi.org/10.3758/BF03213921
Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E. J. (2014). Robust misinterpretation of confidence
intervals. Psychonomic Bulletin & Review,21(5), 11571164. https://doi.org/10.3758/s13423-013-0572-3
Ibrahim, B. (2021). Statistical methods used in Arabic journals of library and information science. In Scientometrics
(Vol. 126, Issue 5). Springer International Publishing. https://doi.org/10.1007/s11192-021-03913-2
Ioannidis, J. P. A. (2018). Meta-research: Why research on research matters. PLoS Biology,16(3), e2005468. https://doi.
org/10.1371/journal.pbio.2005468
Lakens, D. (2021). The practical alternative to the p value is the correctly used p value. Perspectives on Psychological
Science: a Journal of the Association for Psychological Science,16(3), 639648. https://doi.org/10.1177/
1745691620958012
Lecoutre, M. P., Poitevineau, J., & Lecoutre, B. (2003). Even statisticians are not immune to misinterpretations of
null hypothesis significance tests. International Journal of Psychology,38(1), 3745. https://doi.org/10.1080/
00207590244000250
COGENT EDUCATION 15
Lyu, X. K., Xu, Y., Zhao, X. F., Zuo, X. N., & Hu, C. P. (2020). Beyond psychology: Prevalence of P value and confidence
interval misinterpretation across different fields. Journal of Pacific Rim Psychology,14, e6. https://doi.org/10.1017/
prp.2019.28
Mansour, E. (2016). Arab authorsperceptions about the scholarly publishing and refereeing system used in
Emeralds library and information science journals. New Library World,117(7/8), 414439. https://doi.org/10.1108/
NLW-01-2016-0007
Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science,
34(2), 103115. https://doi.org/10.1086/288135
Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific Utopia: II. Restructuring incentives and practices to promote
truth over publishability. Perspectives on Psychological Science: a Journal of the Association for Psychological
Science,7(6), 615631. https://doi.org/10.1177/1745691612459058
Nuzzo, R. (2014). Statistical errors P values, the gold standardof statistical validity, are not as reliable as many
scientists assume. Nature,506(7487), 150152. https://doi.org/10.1038/506150a
Shehata, A. M. K., & Elgllab, M. F. M. (2018). Where Arab social science and humanities scholars choose to publish:
Falling in the predatory journals trap. Learned Publishing,31(3), 222229. https://doi.org/10.1002/leap.1167
Smith, L., & Abouammoh, A. (2013). Higher education in Saudi Arabia achievements, challenges and opportunities.
Springer.
Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology,37(1), 12. https://doi.org/10.1080/
01973533.2015.1012991
Wei, R., Hu, Y., & Xiong, J. (2019). Effect size reporting practices in applied linguistics research: A study of one major
journal. Sage Open,9(2). https://doi.org/10.1177/2158244019850035
16 R. ALRUWAILI
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Current attempts at methodological reform in sciences come in response to an overall lack of rigor in methodological and scientific practices in experimental sciences. However, most methodological reform attempts suffer from similar mistakes and over-generalizations to the ones they aim to address. We argue that this can be attributed in part to lack of formalism and first principles. Considering the costs of allowing false claims to become canonized, we argue for formal statistical rigor and scientific nuance in methodological reform. To attain this rigor and nuance, we propose a five-step formal approach for solving methodological problems. To illustrate the use and benefits of such formalism, we present a formal statistical analysis of three popular claims in the metascientific literature: (i) that reproducibility is the cornerstone of science; (ii) that data must not be used twice in any analysis; and (iii) that exploratory projects imply poor statistical practice. We show how our formal approach can inform and shape debates about such methodological claims.
Article
Full-text available
Because of the strong overreliance on p values in the scientific literature, some researchers have argued that we need to move beyond p values and embrace practical alternatives. When proposing alternatives to p values statisticians often commit the "statistician's fallacy," whereby they declare which statistic researchers really "want to know." Instead of telling researchers what they want to know, statisticians should teach researchers which questions they can ask. In some situations, the answer to the question they are most interested in will be the p value. As long as null-hypothesis tests have been criticized, researchers have suggested including minimum-effect tests and equivalence tests in our statistical toolbox, and these tests have the potential to greatly improve the questions researchers ask. If anyone believes p values affect the quality of scientific research, preventing the misinterpretation of p values by developing better evidence-based education and user-centered statistical software should be a top priority. Polarized discussions about which statistic scientists should use has distracted us from examining more important questions, such as asking researchers what they want to know when they conduct scientific research. Before we can improve our statistical inferences, we need to improve our statistical questions.
Article
Full-text available
P values and confidence intervals (CIs) are the most widely used statistical indices in scientific literature. Several surveys have revealed that these two indices are generally misunderstood. However, existing surveys on this subject fall under psychology and biomedical research, and data from other disciplines are rare. Moreover, the confidence of researchers when constructing judgments remains unclear. To fill this research gap, we surveyed 1,479 researchers and students from different fields in China. Results reveal that for significant (i.e., p < .05, CI does not include zero) and non-significant (i.e., p > .05, CI includes zero) conditions, most respondents, regardless of academic degrees, research fields and stages of career, could not interpret p values and CIs accurately. Moreover, the majority were confident about their (inaccurate) judgements (see osf.io/mcu9q/ for raw data, materials, and supplementary analyses). Therefore, as misinterpretations of p values and CIs prevail in the whole scientific community, there is a need for better statistical training in science.
Article
Full-text available
The p-value has long been the figurehead of statistical analysis in biology, but its position is under threat. p is now widely recognized as providing quite limited information about our data, and as being easily misinterpreted. Many biologists are aware of p's frailties, but less clear about how they might change the way they analyse their data in response. This article highlights and summarizes four broad statistical approaches that augment or replace the p-value, and that are relatively straightforward to apply. First, you can augment your p-value with information about how confident you are in it, how likely it is that you will get a similar p-value in a replicate study, or the probability that a statistically significant finding is in fact a false positive. Second, you can enhance the information provided by frequentist statistics with a focus on effect sizes and a quantified confidence that those effect sizes are accurate. Third, you can augment or substitute p-values with the Bayes factor to inform on the relative levels of evidence for the null and alternative hypotheses; this approach is particularly appropriate for studies where you wish to keep collecting data until clear evidence for or against your hypothesis has accrued. Finally, specifically where you are using multiple variables to predict an outcome through model building, Akaike information criteria can take the place of the p-value, providing quantified information on what model is best. Hopefully, this quick-and-easy guide to some simple yet powerful statistical options will support biologists in adopting new approaches where they feel that the p-value alone is not doing their data justice.
Article
Full-text available
Many surveys of effect size (ES) reporting practices have been conducted in social science fields such as psychology and education, but few such studies are available in applied linguistics. To bridge this gap and to echo the recent calls for more robust statistics from scholars in applied linguistics and beyond, this study represents the first attempt, in the field of applied linguistics, to focus upon ES reporting practices. With an innovative “two-standards” approach for coding, which overcomes the limitations with similar studies in other social science fields (e.g., communication), this study assesses the ES reporting practices over a span of 6 years in a major journal. Findings include the following: (a) the ES reporting rate is about 50% and (b) some improvement of ES reporting over time is in evidence. Future research directions (e.g., examining whether and how ES is interpreted after being reported) are suggested.
Article
Full-text available
Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories call for an end to hyped claims and the dismissal of possibly crucial effects.
Article
Full-text available
Being able to replicate scientific findings is crucial for scientific progress. We replicate 21 systematically selected experimental studies in the social sciences published in Nature and Science between 2010 and 2015. The replications follow analysis plans reviewed by the original authors and pre-registered prior to the replications. The replications are high powered, with sample sizes on average about five times higher than in the original studies. We find a significant effect in the same direction as the original study for 13 (62%) studies, and the effect size of the replications is on average about 50% of the original effect size. Replicability varies between 12 (57%) and 14 (67%) studies for complementary replicability indicators. Consistent with these results, the estimated true-positive rate is 67% in a Bayesian analysis. The relative effect size of true positives is estimated to be 71%, suggesting that both false positives and inflated effect sizes of true positives contribute to imperfect reproducibility. Furthermore, we find that peer beliefs of replicability are strongly related to replicability, suggesting that the research community could predict which results would replicate and that failures to replicate were not the result of chance alone.
Chapter
The role of journal editors is as much to filter out poor or inappropriate submissions as it is to encourage, facilitate and ultimately publish good ones. Among the most important problems with submissions—apart from obvious methodological errors—are lack of originality, poor justification for topic choice and arguments, obscure or vague presentation, and inadequate response to critical comments. Inconsistency in style and referencing is also common, and sometimes constitutes more than mere annoyance. In all this, the vicissitudes of referees’ reports mean that editors are rarely able to simply transmit them to authors without additional comment. In general, there is no satisfactory substitute for the experienced and sensitive editor.
Article
Statistical methods are crucial for research studies in LIS to reach, analyze, and critique results. The current study aims to investigate the use of statistical methods in the Arab literature in LIS, provide a comprehensive review of commonly used statistical methods, and reveal the variables affecting the use of statistical methods. Research papers published in the eight Arab journals from January 2014 to December 2018 were investigated according to three categories of variables related to characteristics of article, author, and research methodologies. Statistical methods were classified into four categories: descriptive methods, parametric inferential methods, nonparametric inferential methods, and predictive methods. The total usage of statistical methods reached 1037 times. Descriptive statistical methods were the most commonly used, followed distantly by predictive statistical methods. Frequency distributions and percentages were used in most quantitative studies. While t-test and ANOVA were the most parametric methods used, Chi-squared was the most non-parametric method used. Most of the studied variables influence the Arab researchers' choice of the category of statistical methods used in their research in LIS. Besides, the author's characteristics are the most influential variables in the use of statistical methods, followed by the research methodology, while the effect of article characteristics is less.
Article
The “replication crisis” has been attributed to misguided external incentives gamed by researchers (the strategic-game hypothesis). Here, I want to draw attention to a complementary internal factor, namely, researchers’ widespread faith in a statistical ritual and associated delusions (the statistical-ritual hypothesis). The “null ritual,” unknown in statistics proper, eliminates judgment precisely at points where statistical theories demand it. The crucial delusion is that the p value specifies the probability of a successful replication (i.e., 1 – p), which makes replication studies appear to be superfluous. A review of studies with 839 academic psychologists and 991 students shows that the replication delusion existed among 20% of the faculty teaching statistics in psychology, 39% of the professors and lecturers, and 66% of the students. Two further beliefs, the illusion of certainty (e.g., that statistical significance proves that an effect exists) and Bayesian wishful thinking (e.g., that the probability of the alternative hypothesis being true is 1 – p), also make successful replication appear to be certain or almost certain, respectively. In every study reviewed, the majority of researchers (56%–97%) exhibited one or more of these delusions. Psychology departments need to begin teaching statistical thinking, not rituals, and journal editors should no longer accept manuscripts that report results as “significant” or “not significant.”