Conference PaperPDF Available

Peer Reviewers: Evaluating the Evaluators



Peer review, its validity, empirical evidence on it, and alternatives. Recording on YouTube:
Ali H. Al-Hoorie
Presented at
The 1st International Symposium on Educational Research
Prince Sultan University, Saudi Arabia
What is the purpose of peer review?
Is peer review a valid tool?
Why or why not? Empirical evidence?
Are there alternatives?
Quality assurance (unintentional errors)
Epistemic filter
Detecting fraud (deliberate falsification)
Reviewer 2?
See also
Ceci & Peters (1982)
Selected 13 articles
Published in top psychology journals (non-blind)
Authored by researchers from prestigious institutions (e.g., Harvard, Stanford)
Replaced the author names with fake names from fake low-status sounding institutions
E.g., Tri-Valley Center for Human Potential
Resubmitted these articles to the same journals that had accepted them
If the peer review is valid, these articles should be accepted again
Especially since this is the final (& best version) of each article
10 articles were not detected (as resubmissions) and were sent out for review
“Of the ten undetected manuscripts, nine were recommended for rejection resoundingly; that
is, both reviewers were in agreement on rejection. None of the twenty reviewers who
recommended rejection even hinted at the possibility that a manuscript might be acceptable
for publication pending revision or rewriting.
“When we removed the original affiliation (e.g., Harvard) and replaced it with the bogus Tri-
Valley Center for Human Development affiliation, the previously acclaimed works were
sharply criticized (almost always on statistical or design grounds). (Ceci & Peters, 1982, p.
Their conclusion: “These findings were a convincing demonstration of the unreliability of
peer review.
(Ceci & Peters, 1982, p. 46)
One editor in the study wrote a letter threatening a lawsuit for copyright violations. Actually, we
had obtained permission from the original authors and copyright holders (publishers) to use
their materials in our study.
Quite unexpectedly, several editors who had not been directly involved with our study wrote
scathing letters calling into question our professional ethics because of our use of deception.
when our chairman was personally contacted by an angry editor, he withdrew all of our
departmental resources until we finished the study. We were sent a memorandum informing us
that typists, photocopy, mail, etc. were off limitsto us as long as we continued using the
procedures we had adopted.
Their conclsion: Obviously, the journal review system has become a sacred cowto some.
(Ceci & Peters, 1982, pp. 4648)
These personal attacks took their toll. For a couple of years we doubted the wisdom of
our decision to do the research. Finally, after two unsuccessful attempts to publish our
findings, replete with personally insulting, ad hominen reviews, we found a publisher and
positive reviews.
Soon press releases were telling a diverse audience of our findings.
Letters of support (over one thousand) came pouring in. Every one of them was
(Ceci & Peters, 1982, p. 47)
If researchers wrote up the peer review process as a measure and submitted it for a journal for
publication, it would get rejected because of its unreliabilityBrian Nosek
Do peer reviewers receive any training?
Do they receive any payment?
Do they have time available?
Do editors follow a systematic way to choose reviewers?
Do they use a valid instrument? Or subjective judgment?
Write a (detailed) proposal of your study
Submit it to a preprint repository
Peers review it, give feedback, & recommended
Conferred in-principle acceptance
Called Stage 1 Review
Now you conduct your study
Adhering to the original proposal
Submit the full manuscript for review again
Reviewers now check adherence to original proposal
Called Stage 2 Review
Now submit to a PCI RR-friendly journal
Manuscript will be accepted WITHUOT further review by the journal
Peer review is outsourced
Feedback before study is conducted
Problems can be fixed
Emphasize design, deemphasize outcome
Minimize biases
Lowers pressure on researchers
Peer reviewers give feedback to help you
Everything is public
Increases transparency
Reviewers can get credit (e.g., review content is not a secret)
Authors submit their manuscript to a preprint repository
This counts as publication
Review occurs afterward
Everything is public
Authors may revise and update their manuscripts
Accelerates dissemination of information
Separates publication from evaluation
So the incentive is no longer to passpeer review & to get published
Authors can focus on impact and on improving their work
See also Al-Hoorie and Hiver (in press)
Matthew effect
The rich get richer, the poor get poorer
Famous scholars will receive the most attention
Guarantee for outsiders
Journalists, general public
Hessel and Bright (2020) propose solutions
Annual Review of Information Science and Technology (2011)
The British Journal for the Philosophy of Science (2020)
Aczel, B., Szaszi, B., & Holcombe, A.O. (2021). A billion-dollar donation: Estimating the cost of researcherstime spent on peer review. Research Integrity and Peer Review,
Al-Hoorie, A. H., & Hiver, P. (in press). Open science in applied linguistics: An introduction to metascience. In Plonsky, L. (Ed.), Open science in applied linguistics. John
Bornmann, L. (2011). Scientific peer review. Annual Review of Information Science and Technology. 45,197245.
Bornmann, L., Mutz, R., Daniel, H.-D. (2010). A reliability-generalization study of journal peer reviews: A multilevel meta-analysis of inter-rater reliability and its
determinants. PLoS ONE, 5(12), e14331.
Ceci, S. J., & Peters, D. P. (1982) Peer review: A study of reliability. Change: The Magazine of Higher Learning, 14(6), 4448,.
Frey, B. S. (2003). Publishing as prostitution? Choosing between ones own ideas and academic success. Public Choice, 116, 205223.
Heesen, R., & Bright, L. K. (2020). Is peer review a good idea? The British Journal for the Philosophy of Science, 72(3), 635663.
Jiang, S. (2021). Understanding authors' psychological reactions to peer reviews: A text mining approach. Scientometrics, 126, 60856103.
Kravitz, R. L., Franks, P., Feldman, M.D., Gerrity, M., Byrne, C., Tierney, W.M. (2010). Editorial peer reviewersrecommendations at a general medical journal: Are they
reliable and do editors care? PLoS ONE, 5(4): e10072.
Merriman, B. (2021). Peer review as an evolving response to organizational constraint: Evidence from sociology journals, 19522018. The American Sociologist, 52, 341
Nosek, B. A. & Bar-Anan, Y. (2012) Scientific Utopia: I. Opening Scientific Communication. Psychological Inquiry, 23(3), 217243.
Peters, D. P., & Ceci, S. J. (1982). Peer-review practices of psychological journals: The fate of published articles, submitted again. Behavioral and Brain Sciences, 5(2), 187
Smith, R. (2021). Richard Smith: Peer reviewersTime for mass rebellion? The BMJ Opinion.
Thank You
ResearchGate has not been able to resolve any citations for this publication.
Full-text available
Background The amount and value of researchers’ peer review work is critical for academia and journal publishing. However, this labor is under-recognized, its magnitude is unknown, and alternative ways of organizing peer review labor are rarely considered. Methods Using publicly available data, we provide an estimate of researchers’ time and the salary-based contribution to the journal peer review system. Results We found that the total time reviewers globally worked on peer reviews was over 100 million hours in 2020, equivalent to over 15 thousand years. The estimated monetary value of the time US-based reviewers spent on reviews was over 1.5 billion USD in 2020. For China-based reviewers, the estimate is over 600 million USD, and for UK-based, close to 400 million USD. Conclusions By design, our results are very likely to be under-estimates as they reflect only a portion of the total number of journals worldwide. The numbers highlight the enormous amount of work and time that researchers provide to the publication system, and the importance of considering alternative ways of structuring, and paying for, peer review. We foster this process by discussing some alternative models that aim to boost the benefits of peer review, thus improving its cost-benefit ratio.
Full-text available
Double-blind peer review is a central feature of the editorial model of most journals in sociology and neighboring social scientific fields, yet there is little history of how and when its main features developed. Drawing from nearly 70 years of annual reports of the editors of American Sociological Association journals, this article describes the historical emergence of major elements of editorial peer review. These reports and associated descriptive statistics are used to show that blind review, ad hoc review, the formal requirement of exclusive submission, routine use of the revise and resubmit decision, and common use of desk rejection developed separately over a period of decades. The article then argues that the ongoing evolution of the review model has not been driven by intellectual considerations. Rather, the evolution of peer review is best understood as the product of continuous efforts to steward editors’ scarce attention while preserving an open submission policy that favors authors’ interests.
Full-text available
Existing norms for scientific communication are rooted in anachronistic practices of bygone eras, making them needlessly inefficient. We outline a path that moves away from the existing model of scientific communication to improve the efficiency in meeting the purpose of public science - knowledge accumulation. We call for six changes: (1) full embrace of digital communication, (2) open access to all published research, (3) disentangling publication from evaluation, (4) breaking the "one article, one journal" model with a grading system for evaluation and diversified dissemination outlets, (5) publishing peer review, and, (6) allowing open, continuous peer review. We address conceptual and practical barriers to change, and provide examples showing how the suggested practices are being used already. The critical barriers to change are not technical or financial; they are social. While scientists guard the status quo, they also have the power to change it.
Full-text available
This paper presents the first meta-analysis for the inter-rater reliability (IRR) of journal peer reviews. IRR is defined as the extent to which two or more independent reviews of the same scientific document agree. Altogether, 70 reliability coefficients (Cohen's Kappa, intra-class correlation [ICC], and Pearson product-moment correlation [r]) from 48 studies were taken into account in the meta-analysis. The studies were based on a total of 19,443 manuscripts; on average, each study had a sample size of 311 manuscripts (minimum: 28, maximum: 1983). The results of the meta-analysis confirmed the findings of the narrative literature reviews published to date: The level of IRR (mean ICC/r2=.34, mean Cohen's Kappa=.17) was low. To explain the study-to-study variation of the IRR coefficients, meta-regression analyses were calculated using seven covariates. Two covariates that emerged in the meta-regression analyses as statistically significant to gain an approximate homogeneity of the intra-class correlations indicated that, firstly, the more manuscripts that a study is based on, the smaller the reported IRR coefficients are. Secondly, if the information of the rating system for reviewers was reported in a study, then this was associated with a smaller IRR coefficient than if the information was not conveyed. Studies that report a high level of IRR are to be considered less credible than those with a low level of IRR. According to our meta-analysis the IRR of peer assessments is quite limited and needs improvement (e.g., reader system).
Full-text available
Editorial peer review is universally used but little studied. We examined the relationship between external reviewers' recommendations and the editorial outcome of manuscripts undergoing external peer-review at the Journal of General Internal Medicine (JGIM). We examined reviewer recommendations and editors' decisions at JGIM between 2004 and 2008. For manuscripts undergoing peer review, we calculated chance-corrected agreement among reviewers on recommendations to reject versus accept or revise. Using mixed effects logistic regression models, we estimated intra-class correlation coefficients (ICC) at the reviewer and manuscript level. Finally, we examined the probability of rejection in relation to reviewer agreement and disagreement. The 2264 manuscripts sent for external review during the study period received 5881 reviews provided by 2916 reviewers; 28% of reviews recommended rejection. Chance corrected agreement (kappa statistic) on rejection among reviewers was 0.11 (p<.01). In mixed effects models adjusting for study year and manuscript type, the reviewer-level ICC was 0.23 (95% confidence interval [CI], 0.19-0.29) and the manuscript-level ICC was 0.17 (95% CI, 0.12-0.22). The editors' overall rejection rate was 48%: 88% when all reviewers for a manuscript agreed on rejection (7% of manuscripts) and 20% when all reviewers agreed that the manuscript should not be rejected (48% of manuscripts) (p<0.01). Reviewers at JGIM agreed on recommendations to reject vs. accept/revise at levels barely beyond chance, yet editors placed considerable weight on reviewers' recommendations. Efforts are needed to improve the reliability of the peer-review process while helping editors understand the limitations of reviewers' recommendations.
Peer reviews play a vital role in academic publishing. Authors have various feelings towards peer reviews. This study analyzes the experiences shared by authors in to understand these authors' psychological reactions to several aspects of peer reviews, including decisions, turnaround time, the number of reviews, and review quality. Text mining was used to extract different types of psychological reactions of authors, including affective processes and cognitive processes. Results show that authors' psychological responses to peer reviews are complex and cannot be summarized by a single numerical rating directly given by the authors. Rejection invokes anger, sadness, and disagreement, but not anxiety. Fast turnaround arouses positive emotions from authors, but slow peer review processes do not increase negative emotions as much. Low-quality reviews lead to a wide array of negative emotions, including anxiety, anger, and sadness.
A growing interest in and concern about the adequacy and fairness of modern peer-review practices in publication and funding are apparent across a wide range of scientific disciplines. Although questions about reliability, accountability, reviewer bias, and competence have been raised, there has been very little direct research on these variables. The present investigation was an attempt to study the peer-review process directly, in the natural setting of actual journal referee evaluations of submitted manuscripts. As test materials we selected 12 already published research articles by investigators from prestigious and highly productive American psychology departments, one article from each of 12 highly regarded and widely read American psychology journals with high rejection rates (80%) and nonblind refereeing practices. With fictitious names and institutions substituted for the original ones (e.g., Tri-Valley Center for Human Potential), the altered manuscripts were formally resubmitted to the journals that had originally refereed and published them 18 to 32 months earlier. Of the sample of 38 editors and reviewers, only three (8%) detected the resubmissions. This result allowed nine of the 12 articles to continue through the review process to receive an actual evaluation: eight of the nine were rejected. Sixteen of the 18 referees (89%) recommended against publication and the editors concurred. The grounds for rejection were in many cases described as “serious methodological flaws.” A number of possible interpretations of these data are reviewed and evaluated.
Survival in academia depends on publications in refereed journals. Authors only get their papers accepted if they intellectually prostitute themselves by slavishly following the demands made by anonymous referees who have no property rights to the journals they advise. Intellectual prostitution is neither beneficial to suppliers nor consumers. But it is avoidable. The editor (with property rights to the journal) should make the basic decision of whether a paper is worth publishing or not. The referees should only offer suggestions for improvement. The author may disregard this advice. This reduces intellectual prostitution and produces more original publications. Copyright 2003 by Kluwer Academic Publishers