Article

Consistency between peer reviewers for a clinical specialty journal

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

To analyze the consistency between independent peer reviewers in evaluating and ranking unsolicited articles, the authors used paired reviews of 422 unsolicited submissions to the Journal of Clinical Anesthesia from the end of 1988 through 1991. (The editors of this journal base their publication decisions, to a substantial degree, on congruence of their reviewers' recommendations). The reviewers were chosen for their interest in reviewing and areas of expertise. Their recommendations were ranged along a continuum of four categories: (1) accept outright, (2) accept with revision, (3) reject in present form (article could be revised and submitted again as a new submission), and (4) reject outright. The pairs of peer reviewers were consonant for 169 papers (40%), differed by one category for 168 papers (40%), differed by two categories for 73 papers (17%), and differed by three categories for 12 papers (3%). Thus, most articles' reviews were in consonance or close to it; articles reviewed by two members of the editorial board, however, were significantly less likely to be consonant (32%) than were those reviewed by two nonmembers (44%, chi-square, p = .027).

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Studies attempting to answer some of these questions have been inconclusive. [2][3][4][5][6][7] Moreover, the fairness and consistency of the peer-review process have recently generated considerable debate. 8,9 Identifying key factors that influence the manuscript peer-review process and making this information available to the scientific community could help improve the quality of a process that is broadly recognized to be necessary, but far from perfect; especially because the outcome of manuscript peer review can have major implications for funding opportunities and career progression. ...
... There is a current lack of consensus about levels of agreement between reviewers in prior studies. [3][4][5][6][7]20,21 In this study, the consistency between reviewers was between fair and moderate. More evidence is needed to determine how consistency between reviewers can be improved. ...
Article
Objective: A better understanding of the manuscript peer-review process could improve the likelihood that research of the highest quality is funded and published. To this end, we aimed to assess consistency across reviewers' recommendations, agreement between reviewers' recommendations and editors' final decisions, and reviewer- and editor-level factors influencing editorial decisions at the journal Stroke. Methods: We analyzed all initial original contributions submitted to Stroke from January 2004 through December 2011. All submissions were linked to the final editorial decision (accept vs reject). We assessed the level of agreement between reviewers (intraclass correlation coefficient). We compared the initial editorial decision (accept, minor revision, major revision, and reject) across reviewers' recommendations. We performed a logistic regression analysis to identify reviewer- and editor-related factors associated with acceptance as the final decision. Results: Of 12,902 original submissions to Stroke during the 8-year study period, the level of agreement between reviewers was between fair and moderate (intraclass correlation coefficient = 0.55, 95% confidence interval [CI] = 0.09-0.75). Likelihood of acceptance was <5% if at least 1 reviewer recommended a rejection. In the multivariate analysis, higher reviewer-assigned priority scores were related to greater odds of acceptance (odds ratio [OR] = 26.3, 95% CI = 23.2-29.8), whereas higher number of reviewers (OR = 0.54 per additional reviewer, 95% CI = 0.50-0.59) and suggestions for reviewers by authors versus no suggestions (OR = 0.83, 95% CI = 0.73-0.94) had lesser odds of acceptance. Interpretation: This analysis of the peer-review process at Stroke identified several factors that might be targeted to improve the consistency and fairness of the overall process.
... literature have tended to examine the effects of either reviewer characteristics [2,[5][6][7][8][9][10][11][12][13] or manuscript characteristics [3,14]. To our knowledge, no study has examined the interaction of reviewer characteristics and manuscript characteristics in a single analysis. ...
... The moderate amount of disagreement between reviewers in our study has also been described in the literature of other medical subspecialties [1,[7][8][9]. Such reviewer disagreement provides further support for the important intermediary role played by editors in the peer-review process [2,[15][16][17][18]. ...
Article
The purpose of our study was to determine which manuscript reviewer characteristics are most strongly associated with reviewer performance as judged by editors of the American Journal of Roentgenology (AJR). At the AJR, manuscript reviews are rated by the journal editors on a subjective scale from 1 (lowest) to 4, on the basis of the value, thoroughness, and punctuality of the critique. We obtained all scores for AJR reviewers and determined the average score for each reviewer. We also sent a questionnaire to 989 reviewers requesting specific information regarding the age, sex, radiology subspecialty, number of years serving as a reviewer, academic rank, and practice type of the reviewer. The demographic profiles were correlated with the average quality score for each reviewer. Statistical analysis included correlation analysis and analysis of variance modeling. Reviewer quality scores were also correlated with the scoring of individual reviews and ultimate disposition of 196 manuscripts sent to the AJR during the same period. Responses to the questionnaire were obtained from 821 reviewers (83.0%), for whom quality scores were available for 714 (87.0%). Correlation analysis shows that the quality score of reviewers strongly correlated with younger age (p = 0.001). A statistically significant correlation between quality score and practice type was seen (p = 0.008), with reviewers from academic institutions receiving higher scores. No significant correlation was found between quality score and sex (p = 0.72), years of reviewing (p = 0.26), academic rank (p = 0.10), or the ultimate disposition of the manuscript (p = 0.40). The quality score of the reviewers showed no variation by subspecialty (p = 0.99). The highest-rated AJR reviewers tended to be young and from academic institutions. The quality of peer review did not correlate with the sex, academic rank, or subspecialty of the reviewer.
... Bolek et al. (2022) reported that reviewers` assessments of separate manuscript sections did not fully align with their final recommendations. Cullen and Macaulay (1992) have found that only 40% of paired human reviewers for a clinical journal were in complete agreement in their review reports. Reviewers often provide divergent assessments of the same manuscript and are usually influenced by factors such as disciplinary background, personal beliefs, and individual interpretation of review criteria (Cicchetti, 1991;Bornmann et al., 2010). ...
... It certainly seems unlikely that peer review is "easy" to do well, given an abundance of evidence about its low reliability, e.g. : Bornmann 2011;Cicchetti 1991;Snodgrass 2006;Heesen and Bright 2021;Brembs 2018;Campanario 1996;Campos-Arceiz et al. 2015;Cullen and Macaulay 1992;Deveugele and Silverman 2017;Ernst et al. 1993;Howard and Wilkinson 1998;Jackson et al. 2011;Kravitz et al. 2010;Mahoney 1977;Peters and Ceci 1982;Reinhart 2009;Rothwell and Martyn 2000;Rubin et al. 1993;Siegelman 1991;Siler et al. 2015. Many of these papers contain suggestions to improve peer review, but they are all costly to implement. ...
Article
Full-text available
We model the impact of different incentives on journal behavior in undertaking peer review. Under one scheme, the journal aims to publish the highest quality papers; under the second, the journal aims to maintain a high rejection rate. Under both schemes, journals prefer to set very high standards for acceptance despite allowing significant error in peer review. Under the second scheme, however, in order to encourage more submissions from mediocre papers, the journal is incentivized to make its editorial process less accurate. This leads to both worse peer review and to lower quality articles being published.
... Data are conflicting. One early study from the Journal of Clinical Anesthesia found moderate levels of reviewer concordance (40% of papers received identical recommendations from two reviewers and an additional 40% differed by only one category).[5] However, published data from the fields of radiology[6], clinical neuroscience[7], and rehabilitation[8] suggest that chance-corrected agreement between reviewers is only fair. ...
Article
Full-text available
Editorial peer review is universally used but little studied. We examined the relationship between external reviewers' recommendations and the editorial outcome of manuscripts undergoing external peer-review at the Journal of General Internal Medicine (JGIM). We examined reviewer recommendations and editors' decisions at JGIM between 2004 and 2008. For manuscripts undergoing peer review, we calculated chance-corrected agreement among reviewers on recommendations to reject versus accept or revise. Using mixed effects logistic regression models, we estimated intra-class correlation coefficients (ICC) at the reviewer and manuscript level. Finally, we examined the probability of rejection in relation to reviewer agreement and disagreement. The 2264 manuscripts sent for external review during the study period received 5881 reviews provided by 2916 reviewers; 28% of reviews recommended rejection. Chance corrected agreement (kappa statistic) on rejection among reviewers was 0.11 (p<.01). In mixed effects models adjusting for study year and manuscript type, the reviewer-level ICC was 0.23 (95% confidence interval [CI], 0.19-0.29) and the manuscript-level ICC was 0.17 (95% CI, 0.12-0.22). The editors' overall rejection rate was 48%: 88% when all reviewers for a manuscript agreed on rejection (7% of manuscripts) and 20% when all reviewers agreed that the manuscript should not be rejected (48% of manuscripts) (p<0.01). Reviewers at JGIM agreed on recommendations to reject vs. accept/revise at levels barely beyond chance, yet editors placed considerable weight on reviewers' recommendations. Efforts are needed to improve the reliability of the peer-review process while helping editors understand the limitations of reviewers' recommendations.
... In this regard, we have to trust the eye of the reviewer and the future. 14,15 In our analysis we relied on the authors' declaration of affiliation(s) and MEDLARS' classification of "human" studies. We considered all human studies affiliated with nonmedical institutions to be relevant without testing for relevance. ...
Article
Full-text available
Since 1987 research articles have been catalogued with the author's affiliation address in the 40 databases of the Medical Literature Analysis and Retrieval System (MEDLARS) of the National Library of Medicine, Bethesda, Md. The present study was conducted to examine the Canadian entries in MEDLARS to interpret past and future trends and to combine the MEDLARS demographic data with data from other sources to rank Canadian research output of human studies both nationally and internationally. The PubMed Web site of the National Library of Medicine was used to count medical articles archived in MEDLARS and published from Jan. 1, 1989, through Dec. 31, 1998. The articles attributed to Canadian authors were compared by country, province, city, medical school, hospital, article type, journal and medical specialty. During the study period Canadian authors contributed on average 3% (standard deviation [SD] 0.2%) of the worldwide MEDLARS content each year, which translated to a mean of 11,067 (SD 1037) articles per year; 49% were human studies, of which 13% were clinical or controlled trials, and 55% involved people aged 18 years or less. In total, 68% of the articles were by authors affiliated with Canadian medical schools; those affiliated with the University of Toronto accounted for the greatest number (8604), whereas authors affiliated with McGill University had the greatest rate of annual increase in the quantity published (8%). Over one-third (38%) of the articles appeared in Canadian journals. When counted by specialty, 17% of the articles were by authors with clinical specialties, 5% by those with surgical specialties and 3% by those with laboratory specialties. The annual rate of increase in research output for Canada was more than 3 times higher than that seen world wide. Canada is now ranked seventh among countries contributing human studies to MEDLARS. The increase indicates that Canada's medical schools are productive, competitive in making contributions to medical science and are supporting Canadian journals.
Article
Background The peer review system can be seen as a method to assess the quality of scientific papers. It is prone to bias of various types and its validity has not been firmly established. To test the hypothesis that there is a reviewer bias against unconventional medicine, a randomized controlled double-blind peer review was performed. Methods: 291 medical doctors from a wide variety of specialties, drawn from a list of participants of an interdisciplinary International Conference, were randomly assigned to receive one of two versions of a manuscript. Version M dealt with an in vitro experiment on a mainstream drug, while the otherwise identical version V used an unconventional drug. All participants were asked to complete an evaluation sheet containing ungraded visual analogue scales (VAS) on different quality criteria: relevance of subject, formulation of hypothesis, randomisation, inclusion/exclusion criteria, sample size, statistical evaluation, choice of main outcome measures, follow-up, clarity of description, linguistic quality, overall quality of the study, and overall quality of the manuscript. Participants were debriefed only after completion of the study. Results: No differences in ratings of the two versions of the letter were observed. Ratings covered the entire range of the VASs, showing that peer review was associated with low inter-rater reliability. Conclusions: In this setting there was no reviewer-bias against unconventional therapies; peer-review in general was not as reliable as one would have hoped. Future research might be directed at finding means of improving it.
Article
Introduction. Peer review (PR) is the traditional model for improving and deciding about the scientific publications. It consists of sending the material received for publication to the experts who analyze its quality and made a constructive criticism so that the authors can improve it while advising the editor on his/her publication decision. Development. An analysis is made of the situation of peer review based on its different characteristics, such as the peer's impartiality, equality, confidentiality and competence and their role in the decision making process on the acceptability of the manuscript and as an instrument for improvement of the articles. Conclusions. PR is more a culture than a method that can be evaluated. If it is the editor who makes the decision in a publication, it is really the editor who can evaluate the benefit of the process and know the grade of efficacy of the peer reviewers. It is true that the PR has disadvantages, such as creating officialism, that takes up time and resources and that may generate discouragement in some authors, however, it leads to improvement in the articles, correction of errors and alerts on distortions. The PR should be understood not only as a decision formula, but also as a way to improve the manuscripts placed at the service of the authors, regardless of whether the article is published or not.
Article
Full-text available
Introduction: The expert is essential in the external evaluation process and for this reason it is necessary to know the profile and characteristics of the best evaluators. Material and methods: We have retrospectively analysed the external review process of the journal from the 1st of January 2005 until the 30th of June 2009, with the aim of knowing the profile of the experts in relation to the response to the requests. The response rate, mean delay time and responder rate were evaluated, using, sex, age and forming part of the editorial committee as variables. Results: The response rate fell as the number of evaluations increased. Women had a higher response rate, lower delay time and better performance than males. The response rate showed a tendency to decrease with age and the large majority of responders were between 29 and 39 years. Being a member of the journal committees was not associated with a better response rate, although there was less delay. The response rate and the delay time are similar, although it may increase with the number of requests to a reviewer. Conclusions: Lower age and being female are associated with a better response. No fatigue effect was observed in good responders, but if there is a fall in the response rates the number of evaluators should be increased.
Article
An abstract is unavailable. This article is available as HTML full text and PDF.
Article
The authors describe the roles of the reviewer within common types of review processes (board review and pool review) used at scholarly journals. In addition to giving background information about the types of review processes used at journals, they discuss how journals identify potential reviewers and select specific reviewers for a manuscript. They explain the types of review forms used by various journals and how reviewers' comments on these forms are used in the editorial decisions made by the journal editors. Finally, the authors describe the role of reviewers' comments in guiding the revision and final editing of manuscripts.
Article
An abstract is unavailable. This article is available as HTML full text and PDF.
Article
Full-text available
The expert is essential in the external evaluation process and for this reason it is necessary to know the profile and characteristics of the best evaluators. We have retrospectively analysed the external review process of the journal from the 1st of January 2005 until the 30th of June 2009, with the aim of knowing the profile of the experts in relation to the response to the requests. The response rate, mean delay time and responder rate were evaluated, using, sex, age and forming part of the editorial committee as variables. The response rate fell as the number of evaluations increased. Women had a higher response rate, lower delay time and better performance than males. The response rate showed a tendency to decrease with age and the large majority of responders were between 29 and 39 years. Being a member of the journal committees was not associated with a better response rate, although there was less delay. The response rate and the delay time are similar, although it may increase with the number of requests to a reviewer. Lower age and being female are associated with a better response. No fatigue effect was observed in good responders, but if there is a fall in the response rates the number of evaluators should be increased.
Article
Peer review (PR) is the traditional model for improving and deciding about the scientific publications. It consists of sending the material received for publication to the experts who analyze its quality and made a constructive criticism so that the authors can improve it while advising the editor on his/her publication decision. An analysis is made of the situation of peer review based on its different characteristics, such as the peer's impartiality, equality, confidentiality and competence and their role in the decision making process on the acceptability of the manuscript and as an instrument for improvement of the articles. PR is more a culture than a method that can be evaluated. If it is the editor who makes the decision in a publication, it is really the editor who can evaluate the benefit of the process and know the grade of efficacy of the peer reviewers. It is true that the PR has disadvantages, such as creating officialism, that takes up time and resources and that may generate discouragement in some authors, however, it leads to improvement in the articles, correction of errors and alerts on distortions. The PR should be understood not only as a decision formula, but also as a way to improve the manuscripts placed at the service of the authors, regardless of whether the article is published or not.
Article
Reviewers can disagree substantially when evaluating the same materials. For papers submitted to an editorial board, the Editor-in-Chief can suggest compromises. However, this is not the case in the normal abstract grading procedures for large meetings. If important discrepancies arise between reviewers, a review committee may propose corrective measures. However, this is only feasible for smaller meetings with a limited number of abstract submissions. In this study, when reviewing the same abstracts, a statistically significant correlation between reviewers was present in 15 instances and absent in 13 others. It would appear that some review of the reviewer is highly desirable and may prevent publication bias.
Article
To identify the characteristics of the manuscripts submitted to the Canadian Journal of Anesthesia (CJA) associated with their acceptance or rejection and to analyze the reviewers' comments and their impact on the editors' decision to publish. Peer review material was analyzed from 213 submissions to the CJA. Characteristics of accepted and rejected manuscripts were compared. Reviewers' comments were classified according to editorial criteria used by the journal and the distribution of the different types of comments amongst accepted and rejected submissions was compared. Characteristics of 213 manuscripts and comments from 405 reviewers were analyzed. Overall, 57% of manuscripts submitted to the CJA were accepted. The type of research (study vs case report, clinical vs laboratory science) had no impact on the fate of the manuscripts; however, frequency of acceptance differed between articles originating from different geographic regions (P < 0.0001) with Canadian submissions posting the highest frequency (86%). Comment analysis suggests that the relationship between the experimental design, the results, and the conclusion was the main determinant of an article's fate. Lack of originality or inappropriate experimental design were likely to be associated with rejection. Conversely, aspects involving the presentation of manuscripts (tables, figures, references) were rarely cited as reasons to justify acceptance or rejection. Although articles are judged on many criteria, authors need to be aware that some aspects of a manuscript, namely the relationship between experimental design, results, and conclusions, the originality, and the use of an appropriate study design, are the most important features with regard to its acceptance or rejection.
Article
The objective of this study was to examine the relative influence of manuscript characteristics and peer-reviewer attributes in the assessment of manuscripts. Over a 6-month period, all major papers submitted to the American Journal of Roentgenology (AJR) were entered into a database that recorded manuscript characteristics, demographic profiles of reviewers, and the disposition of the manuscript. Manuscript characteristics included reviewer ratings on five scales (rhetoric, structure, science, import, and overall recommendation); the subspecialty class of the paper; the primary imaging technique; and the country of origin. Demographic profiles of the reviewers included age, sex, subspecialty, years of reviewing, academic rank, and practice type. Statistical analysis included correlation analysis, ordinal logistic regression, and analysis of variance. A total of 445 reviews of 196 manuscripts were the work of 335 reviewers. Of the 196 submitted manuscripts, 20 (10.2%) were accepted, 106 (54.1%) were rejected, and 70 (35.7%) were rejected with the opportunity to resubmit. Regarding manuscript characteristics, we found that the country of origin, score on the science scale, and score on the import scale were statistically significant variables for predicting the final disposition of a manuscript. Of the reviewer attributes, we found a statistically significant association between greater reviewer age and also higher academic rank with lower scores on the import scale. Reviewer concordance was higher for structure, science, and overall scores than on the rhetoric and import scores. Greater variability in the overall scoring of papers could be attributed to the reviewer than the manuscript, but both factors combined explain only 23% of the total variability. At the AJR, manuscript acceptance was most strongly associated with reviewer scoring of the science and import of a major paper and also with the country of origin. Reviewers who were older and of higher academic rank tended to discount the importance of manuscripts.
Article
Scientific findings must withstand critical review if they are to be accepted as valid, and editorial peer review (critique, effort to disprove) is an essential element of the scientific process. We review the evidence of the editorial peer-review process of original research studies submitted for paper or electronic publication in biomedical journals. To estimate the effect of processes in editorial peer review. The following databases were searched to June 2004: CINAHL, Ovid, Cochrane Methodology Register, Dissertation abstracts, EMBASE, Evidence Based Medicine Reviews: ACP Journal Club, MEDLINE, PsycINFO, PubMed. We included prospective or retrospective comparative studies with two or more comparison groups, generated by random or other appropriate methods, and reporting original research, regardless of publication status. We hoped to find studies identifying good submissions on the basis of: importance of the topic dealt with, relevance of the topic to the journal, usefulness of the topic, soundness of methods, soundness of ethics, completeness and accuracy of reporting. Because of the diversity of study questions, viewpoints, methods, and outcomes, we carried out a descriptive review of included studies grouping them by broad study question. We included 28 studies. We found no clear-cut evidence of effect of the well-researched practice of reviewer and/or author concealment on the outcome of the quality assessment process (9 studies). Checklists and other standardisation media have some evidence to support their use (2 studies). There is no evidence that referees' training has any effect on the quality of the outcome (1 study). Different methods of communicating with reviewers and means of dissemination do not appear to have an effect on quality (3 studies). On the basis of one study, little can be said about the ability of the peer-review process to detect bias against unconventional drugs. Validity of peer review was tested by only one small study in a specialist area. Editorial peer review appears to make papers more readable and improve the general quality of reporting (2 studies), but the evidence for this has very limited generalisability. At present, little empirical evidence is available to support the use of editorial peer review as a mechanism to ensure quality of biomedical research. However, the methodological problems in studying peer review are many and complex. At present, the absence of evidence on efficacy and effectiveness cannot be interpreted as evidence of their absence. A large, well-funded programme of research on the effects of editorial peer review should be urgently launched.
Article
Full-text available
Deficiencies in medical education research quality are widely acknowledged. Content, internal structure, and criterion validity evidence support the use of the Medical Education Research Study Quality Instrument (MERSQI) to measure education research quality, but predictive validity evidence has not been explored. To describe the quality of manuscripts submitted to the 2008 Journal of General Internal Medicine (JGIM) medical education issue and determine whether MERSQI scores predict editorial decisions. Cross-sectional study of original, quantitative research studies submitted for publication. Study quality measured by MERSQI scores (possible range 5-18). Of 131 submitted manuscripts, 100 met inclusion criteria. The mean (SD) total MERSQI score was 9.6 (2.6), range 5-15.5. Most studies used single-group cross-sectional (54%) or pre-post designs (32%), were conducted at one institution (78%), and reported satisfaction or opinion outcomes (56%). Few (36%) reported validity evidence for evaluation instruments. A one-point increase in MERSQI score was associated with editorial decisions to send manuscripts for peer review versus reject without review (OR 1.31, 95%CI 1.07-1.61, p = 0.009) and to invite revisions after review versus reject after review (OR 1.29, 95%CI 1.05-1.58, p = 0.02). MERSQI scores predicted final acceptance versus rejection (OR 1.32; 95% CI 1.10-1.58, p = 0.003). The mean total MERSQI score of accepted manuscripts was significantly higher than rejected manuscripts (10.7 [2.5] versus 9.0 [2.4], p = 0.003). MERSQI scores predicted editorial decisions and identified areas of methodological strengths and weaknesses in submitted manuscripts. Researchers, reviewers, and editors might use this instrument as a measure of methodological quality.
Article
Full-text available
The provocative and discouraging results of the Gottfredson (1978, this issue) article on the reliability and validity of journal reviews prompted us to examine our experience with this Journal. Our casual impression was that independent reviewers often agreed on their ratings of a manuscript's acceptability and suitability for the American Psychologist. Perhaps our use of a 5-point rating scale, in addition to the detailed review of each manuscript, improved reliability. Or perhaps the perceived reliability 'existed only in the eyes of these beholders. In the past 18 months, the Associate Editor completed the review of 132 manuscripts, 78 of which received two or more independent reviews, for a total of 87 decisions-9 manuscripts having been revised and resubmitted after initial rejection. Reviewers agreed exactly in 57 of the 87 paired ratings, with strong agreement on the categories of "reject" and "reject-resubmit." The category with the least agreement was "accept with minor revisions," which the reviewers, arbitrarily numbered 1 and 2 at the time of selection, used with quite different frequencies. The other marginals indicate, however, that reviewers have much the same standards in mind when they review for the American Psychologist, as there were 56 and 53 rejections (Ratings 1 and 2), 17 and 15 reject resubmits, and 10 and 8 acceptances for Reviewers 1 and 2, respectively. From an editor's point of view, the most important decision is whether or not the manuscript is or could become acceptable for publication. Dividing the scale into recommendations of reject (1 and 2) and possibly accept (3, 4, and 5), we find that two reviewers agreed on 45 rejections and 23 possible acceptances, for a helpful total of 68 of the 87 decisions made. Although this simple tabulation does not address the validity of the reviewers' ratings, the interrater reliability is quite high and gives us new faith in ourselves as casual observers. (PsycINFO Database Record
Article
Full-text available
287 manuscripts submitted for publication were each reviewed by 2 referees (from a pool of 200 psychologists), who completed a 1-page appraisal form for each manuscript reviewed. Analysis revealed that rates of recommended and actual acceptance were quite low, but that inter-referee agreement was significantly above chance on 6 of the 7 attributes rated, including recommendation to publish. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Recent publications have left the impression that agreement between reviewers' ratings of manuscripts is low. Correlation-type statistics often are not well-suited for directly measuring actual agreement between 2 sets of ratings. It is concluded that standards and training for reviewers should be improved. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Rejection rates for scholarly journals show substantial variation between disciplines. Explanations of this variation have focused on two possible sources: variation in consensus and in space shortages. Longitudinal data on journal rejection rates show that they have been very stable over time and are largely unaffected by changes in submissions, impugning the argument that space shortages explain disciplinary variation in rejection rates. In contrast, a model of the manuscript-evaluation process can account for the observed variation in rejection rates and also casts light on additional characteristics of manuscript evaluation processes in different disciplines as well. Possible links between consensus and each of the elements of the model are discussed.
Article
Objective: To assess the consistency of an index of the scientific quality of research overviews. Design: Agreement was measured among nine judges, each of whom assessed the scientific quality of 36 published review articles. ITEM SELECTION: An iterative process was used to select ten criteria relative to five key tasks entailed in conducting a research overview. Sample: The review articles were drawn from three sampling frames: articles highly rated by criteria external to the study; meta-analyses; and a broad spectrum of medical journals. JUDGES: Three categories of judges were used: research assistants; clinicians with research training; and experts in research methodology; with three judges in each category. Results: The level of agreement within the three groups of judges was similar for their overall assessment of scientific quality and for six of the nine other items. With four exceptions, agreement among judges within each group and across groups, as measured by the intraclass correlation coefficient (ICC), was greater than 0.5, and 60% (24/40) of the ICCs were greater than 0.7. Conclusions: It was possible to achieve reasonable to excellent agreement for all of the items in the index, including the overall assessment of scientific quality. The implications of these results for practising clinicians and the peer review system are discussed.
Article
The conventional reviewing system of American biomedical journals determines where and when an author may publish, and hence may affect his career. Yet the system's effectiveness in validating reports, or the cost of such validation, has been little studied. Problems are inadequate review, unrealistic editorial expectation of what reviewers can do, and bias. Expert reviewers frequently disagree. Thus concurrence between two reviewers of each of some 500 papers submitted to the New England Journal of Medicine was only moderately better than a chance result. Costs of the reviewing system include time and effort involved, the possible violation of confidentiality, and occasional suppression of the novel advance.Despite deficiencies, the reviewing system is important to maintain standards. It could be improved if studies of its operation were carried out, if reviewers were better indoctrinated, if the work load of reviewers were lessened, if reviews were signed, and if the reviewing process were more rewarding to reviewers.