ArticlePDF Available

Peer Review Agreement or Peer Review Disagreement: Which Is Better?



Peer review is generally considered the cornerstone of the scientific control system. Hence it is critical that peer review works well. The empirical finding that reviewers often disagree among themselves is the starting point for an analysis of peer review. Depending on the reasons for such disagreements, I argue that disagreement, as well as agreement, among reviewers can have both positive and negative effects for science. The empirical research on peer review is analyzed according to a categorization of review objects (manuscripts or research applications) and review outcomes (approval or rejection) and draws on psychological judgment and decision-making research. Moreover, bias in peer review is scrutinized. The conclusion offers implications for the peer review system and for scientific research.
Journal of Psychology of Science and Technology, Volume 2, Number 1, 2009 © Springer Publishing Company 5
DOI: 10.1891/1939-7054.2.1.5
Peer Review Agreement or
Peer Review Disagreement:
Which Is Better?
Sven Hemlin, PhD
University of Gothenburg, Sweden
Peer review is generally considered the cornerstone of the scientific control system. Hence it is critical
that peer review works well. The empirical finding that reviewers often disagree among themselves is the
starting point for an analysis of peer review. Depending on the reasons for such disagreements, I argue
that disagreement, as well as agreement, among reviewers can have both positive and negative effects
for science. The empirical research on peer review is analyzed according to a categorization of review
objects (manuscripts or research applications) and review outcomes (approval or rejection) and draws on
psychological judgment and decision-making research. Moreover, bias in peer review is scrutinized. The
conclusion offers implications for the peer review system and for scientific research.
Keywords: peer review; citations; (dis)agreement; bias
lthough the peer review system is the basis of
quality control in scientifi c research, it is some-
what surprising that reviewers often disagree
about the merits of many research submissions. Peer
review studies have typically found disagreement is
common among reviewers on research presented in
manuscripts for scientifi c publication and in grant ap-
plications. Moreover, research fi nds that for various
reasons reviewers are often biased in their judgments
on submissions. This article attempts to clarify why
peer review disagreement occurs, to explain the na-
ture of peer review bias, and to suggest implications
of such disagreement and bias on peer review and
scientifi c research. Psychological research on cogni-
tive judgments will be applied in the analyses.
The most striking report of peer review disagree-
ment is probably the Peters and Ceci study (1982).
For this study, the authors resubmitted 12 articles to
the psychology journals that had previously published
them. Of the 12 articles, 8 were rejected on method-
ological grounds (with high interreferee agreement),
3 were detected as already published, and 1 was
accepted. The Peters and Ceci study caused a heated
debate among scholars (Harnad, 1982, 1985) result-
ing in the paradoxical conclusion that reviewers, who
should be expert reviewers, actually are not (Millman,
1982). If they had been in agreement on the decision
to approve or reject we could ideally—at least if ar-
guments were good—conclude that they were expert
reviewers. However, this is not the whole story, as
will be evident later on in this article.
Other studies raise more problematic issues con-
cerning the peer review system. First, some authors
argue that the outcome of peer review is foremost a
question of which reviewers are chosen for the re-
view (Fuller, 1995). This is primarily an issue for
editors and research grant committees who choose
reviewers and a warning that they be careful in their
choices. Reviewers should be experts in the fi eld of
research being reported or applied for. In addition,
recent discussions in such major journals as Nature
and The Scientist have asked whether the peer review
system is near collapse because of too few reviewers
for the many manuscripts submitted. The concern
is that this situation may lead to poor peer review
quality and biases. Second, another problem con-
cerns individual and group decision biases. Experts
may be biased in their decisions. For example, a re-
viewer (or a review committee) may with or without
awareness be more positive to a manuscript from her/
his (their) own university or fi eld than from others.
Third, and in addition to the reviewer disagreement
debate, a concern has grown about the function of
the peer review system in science in general, particu-
larly as it relates to the prevention of fraud, decep-
tion, and misconduct in science (Chubin & Hackett,
1990; Cicchetti, 1991; Cole, Cole, & Simon, 1981;
Daniel, 1993; Hargens, 1988; Hemlin, 1996; Judson,
1994; Marsh, Jayasinghe, & Bond, 2008; Sandström
& Hällsten, 2008; Weller, 2001; Wennerås & Wold,
1997). This debate concerns another cornerstone in
peer review connected to research ethics, namely to
guarantee that errors, deliberate deception, fraud,
and sloppy research are not published and spread in
the scientifi c community.
There are now a number of peer review studies that
reach the same conclusion as Peters and Ceci’s (1982)
classic research. In a review, Cicchetti (1991) reported
surprisingly low interreferee agreements, from as low
as .18 to a more acceptable .57. Weller’s (2001) re-
view, restricted to editorial peer review, also supports
these results. Weller reported fi ndings of wide-ranging
peer agreement among reviewers for various journals,
from 14% (or 86% disagreement) to 98% (or 2% dis-
agreement) and an average of 49% agreement (or 51%
disagreement). Many studies Weller reported on were
in the fi eld of psychology, although about 50% were
in the medical fi eld. It should be noted that Weller
presented the results as agreements among reviewers,
although disagreements might have been presented.
In addition to her fi gures, I have added the disagree-
ment percentages as well.
Furthermore, it does not appear that peer review
disagreement is a diminishing historical phenom-
enon in contemporary science. Recent studies of peer
reviews of grant applications still report low agree-
ment fi gures (kappa = .36) in medical research (Mayo
et al., 2006). Pearson correlations of .15 for single-
rater reliabilities of the quality of grant applications
in a cross-disciplinary database of the Australian
Research Council (Marsh et al., 2008) support the
general picture of peer disagreement.
Given this widespread disagreement on the mer-
its of an acceptable journal article or acceptable grant
application, there is concern about the implications
for scientifi c research when reviewers do not agree on
what good science is. Yet we know science is always
developing and theories are not permanent. The issue
becomes whether agreement or disagreement among
reviewers is preferable in scientifi c research. This
issue is further elaborated on in the following four
sections of this article. In sum, one might presume
that agreements and disagreements in peer review
are welcome if the reviews are done by experts in an
unbiased way. In the case of a disagreement it should
be clear to the author what the arguments are for the
In the fi rst section of this article, I review a num-
ber of peer review studies and analyze various rea-
sons for peer review decisions. In the second section,
I distinguish between research grant application re-
view and manuscript review and between approval
and rejection decisions in peer reviews. In the third
section, I discuss bias and ethical issues in peer re-
view research. Finally, I draw conclusions about some
implications of the (dis)agreement in peer review for
scientifi c research as well as offer suggestions on how
to improve the peer review system. In addition, I draw
on psychological judgment and decision- making
research in the analysis throughout the article.
We propose the following hypothetical examples
to address the issue of whether peer review agree-
ment or disagreement is preferable. First, consider
the situation where reviewers agree on a manuscript,
a grant application, or other object (e.g., candidates
for professorships, lectureships, or scientifi c prizes).
For manuscripts, a favorable agreement may be ratio-
nally explained by reference to the scientifi c merits
of the submission. The results may be new, the re-
search methods sound, and the conclusions valid. An
unfavorable agreement, conversely, may rationally be
explained with the counterarguments that the results
are not new, incorrect methods have been used, and
the conclusions are invalid. For grant applications,
a similar rational logic in the acceptance/rejection
process would be used.
However, if the favorable agreement is reached
nonrationally, either for manuscripts or for grant ap-
plications, we would suspect that in some way the
reviewers were biased toward the author, the subject,
and/or the results. Unfavorable agreement, reached
nonrationally, may be the result of prejudice, hostil-
ity or competition. The conclusion is that rational
agreement is positive and nonrational agreement is
negative. To counteract bad reviews and to increase
reliability in peer review, editors and grant applica-
tion committees take precaution by using several in-
dependent reviewers. However, peer review validity
is still not resolved.
Second, consider the situation where there is dis-
agreement among the reviewers. If the disagreement
is rational, the reviewers may disagree on the mer-
its of a manuscript or of a grant application because
they have applied different quality criteria. Using
manuscripts as an example, one reviewer may be im-
pressed by the novelty of the results, though another
doubts the reliability of the data in the study. This
example demonstrates that cognitive judgments are
not perfect in human reasoning (e.g., Slovic, 2001).
The nonrational disagreement may be explained by
the same biased reasoning found in the case of non-
rational agreement. Similarly, the conclusion is that
rational disagreement is positive and nonrational dis-
agreement is negative.
Peer reviews are either ex post judgments, such as
manuscript peer reviews, or ex ante judgments, such
as grant application peer reviews. An ex post judg-
ment is done after research is carried out, whereas an
ex ante judgment is done before research is conducted
(Hemlin, 1999; Montgomery & Hemlin, 1990). The
division between manuscript and grant application
review is consistent with the historical evolution of
peer review reported by Burnham (1990, 1992). Con-
tradictory results from studies concerning degree of
agreement among reviewers on both ex post and ex ante
judgments were reviewed by Cicchetti (1991), Hemlin
(1999), and Hemlin and Montgomery (1990). Some
studies reviewed showed a low to moderately high in-
terreferee agreement (e.g., correlations of .19–.54 for
manuscript reviews), and others demonstrated consis-
tently low fi gures (e.g., .15–.37 for grant reviews).
One conclusion from these reviews is that it is
probably important to divide peer review disagree-
ment into two categories: whether the reviewers’
confl ict concerns completed research ( ex post ) or pro-
posed research ( ex ante ). For instance, it may be more
problematic to understand fully the potential of a pro-
posed research project than a completed research task.
Judgments in the fi rst case are clearly more risky than
in the second case. Following this, a seminal study of
peer review of grant applications showing that con-
sensus was low and chance high in reviews (Cole et
al., 1981) was corroborated by Marsh et al. (2008). Let
us examine in more detail ex post (manuscript) and ex
ante (grant proposals) reviews of peer evaluation and
agreement and disagreement among reviewers.
Manuscript Reviews
Two reviews of empirical investigations of peer re-
views of manuscripts (Cicchetti also for grant pro-
posals) submitted for publication reveal a remarkable
lack of reliability and validity in refereeing (Cicchetti,
1991; Weller, 2001). Burnham (1992) proposed a
provisional explanation of this result, namely that
the standards against which manuscripts are judged
vary between disciplines and between journals. Some
studies also show this. Social science journals have
had generally higher rejection rates (e.g., 87% in the
eld of sociology according to Hargens, 1988, 1990)
compared to hard science journals with their con-
siderably lower rejection rates (e.g., 19% for nuclear
physics and 22% for condensed matter). However,
general fi elds in the hard sciences have higher rejec-
tion rates than more specialized fi elds in the hard
sciences, making rejection rates in their journals al-
most similar to general social science journals. For
example, 30% of the papers are rejected for gen-
eral physics (Cicchetti, 1991). Studies by Hargens
(1988, 1990) showed that disciplinary differences
in consensus on research priorities and procedures
contribute to variations in the journals’ peer review
systems. For instance, physics for a long time pe-
riod (1948–1956) relied on a single referee for each
submitted manuscript to its leading journal Physical
Review (Cicchetti, 1991). A possible explanation for
rejection rate variation may be that there is gener-
ally a higher degree of consensus on trustworthy re-
search reported in the hard sciences than in the soft
sciences (social sciences and the humanities). More-
over, the common picture of higher rejection rates in
the social sciences than in the hard sciences is largely
supported by Weller´s (2001) review of manuscript
peer review. However, as noted earlier, this observa-
tion is not valid for the general hard science journals.
A number of researchers have attempted to explain
differences in acceptance and rejection decisions in
journal peer reviews (Cicchetti, 1991; Van Lange,
1996). Cicchetti found that manuscript reviewers for
scientifi c journals agreed on accepted manuscripts
in specifi c and focused disciplines, or subfi elds
(e.g., nuclear physics and behavioral neuroscience)
and disagreed on submissions in general and diffuse
disciplines, or subfi elds (e.g., general fi elds of medi-
cine, cultural anthropology, and social psychology).
Van Lange (1996) found there was greater agreement
about acceptance than rejection of manuscripts in
social psychology. Furthermore, according to Van
Lange, social psychologists, when acting as review-
ers, believed their own manuscripts to be superior in
quality to other manuscripts. However, Van Lange’s
study was restricted in sample size (48 participants)
and used a simple method (ratings of quality of
imagined manuscripts).
Finally, it is possible that the more consensual
peer evaluations in the focused disciplines of the
hard sciences refl ect their degree of formalization.
This is to some extent supported in recent research
on scientifi c creativity by Simonton (in press) where
he argues that a hierarchical organization of scientifi c
disciplines can be done on a dimension from the logi-
cal, objective, formal, and conventional disciplines
(e.g., nuclear physics) to the intuitive, subjective,
emotional, and individualistic disciplines such as the
arts and literature disciplines.
Grant Reviews
Regarding grant applications, there is, according to em-
pirical research, less difference in acceptance/ rejection
rates between the hard and soft sciences (Cole et al.,
1981; Marsh et al., 2008). In addition, studies on grant
applications have shown that reviewers are generally
more unanimous in their agreement on rejected ap-
plications and less on accepted ones. Cicchetti (1991)
concluded that the criteria for rejections are more reli-
able because of this greater agreement. This unanimity
on rejections could also mean that the characteristics
of a rejected application are more distinctive than the
characteristics of other applications during the initial
phase of the decision-making process. However, Cole
and colleagues (1981) argue that reviewers of grant
applications may disagree, even when using the same
evaluation criteria. The reviewers in their study of
National Science Foundation (NSF) grant applications
simply disagreed on the nature of good science as pre-
sented in grant applications.
The differences between manuscript and grant
application reviews must be taken seriously when
studying the decision process in peer review. In
manuscript reviews it is clear that there are differ-
ences between natural and social sciences as well as
between fi elds within the two main areas. According
to Cicchetti’s (1991) empirical studies of peer review
decisions, there is more consensus on manuscripts
in certain hard sciences (specialized physics) than
in others (general physics, medicine, and the behav-
ioral sciences). In addition, manuscript decisions are
often made by one individual (the editor), and grant
application decisions are typically made by groups
(scientists). However, this difference is not clear in
many of the studies referred to above. Moreover,
acceptance decisions appear to be more diffi cult than
rejection decisions, with the possible exception of ac-
ceptance decisions on social psychology manuscripts.
As noted previously, there is generally more con-
sensus on grant application rejections than on accep-
tances. More recently, Niemenmaa, Montgomery, and
Hemlin (1995) found support for the claim that grant
application acceptance and rejection decisions in psy-
chology differ. They found that the academic degree,
previous grants, and certain fi elds (cognition, percep-
tion, psychophysics) differentiated between granted
and nongranted applications in psychology in Sweden
during a 6-year period. Gender only indirectly infl u-
enced decisions because more females than males were
represented in the fi elds of psychology (social-clinical)
that were less often granted. In a recent study, Marsh
et al. (2008) generally found no differences between
the hard and soft sciences in peer reviews of grant
applications. This is perhaps a bit surprising given the
ous rejection of a valuable manuscript in physics (see
also Kassirer & Campion, 1994). Horrobin (1990),
who compiled a list of 18 manuscript and grant ap-
plication rejections of innovative research, mainly in
medicine, stressed that the aim of peer review is not
only quality control but also tracking of innovations
in science. Campanario (1996) analyzed the 400 most
cited articles in Current Contents to identify problems
in publishing innovative research articles. He found
that 10.7% of the 400 articles had originally been re-
jected or severely criticized, although eventually they
were published.
These studies focus on the diffi culty of achieving
a balance between quality control and the encourage-
ment of new ideas in science. A problem arises when
reviewers focus more on the former task than on the
latter. One possible explanation for this imbalance is
that it is more diffi cult to support a new idea than to
evaluate whether research is acceptable by scientifi c
standards. As a result, conventional research is sup-
ported more often than innovative research (Sigel-
man & Whicker, 1987). Cognitive judgments may
favor the known to the unknown because they may
be effects of confi gurative thinking rather than piece-
meal analysis (Fiske & Pavelchak, 1986).
It has also been shown that cognitive particular-
ism is prevalent in peer reviews of grant applications.
This means that one subfi eld is unfairly favored over
another (e.g., cognitive psychology over personal-
ity psychology). One study that reveals this bias was
based on research council protocols in the Swedish
Research Council for the Humanities and Social Sci-
ences (HSFR) group for psychology in the years 1988–
1993 (Hemlin, Niemenmaa, & Montgomery, 1995;
Niemenmaa et al., 1995). As previously reported, the
reviewers favored cognitive psychology, perception,
and psychophysics to clinical and social psychology.
The results showed that applications were granted
in those fi elds in line with the reviewer’s own fi elds.
In another study based on observations and tape re-
cordings of 10 meetings of the Science and Engineer-
ing Research Council (SERC) in Great Britain (Travis
& Collins, 1991), grant applications that were ap-
proved tended to be from the same cognitive fi eld as
the peers who evaluated them. As suggested by these
two studies, the cognitive similarity of applicants and
reviewers is important in explaining success in grant
applications. However, empirical data are few, and
differences in peer review decisions on manuscripts in
natural and social science journals.
In conclusion, it is clear that the current under-
standing of decision making in peer review is limited.
However, we can conclude that there are three fac-
tors important in evaluating peer review results:
(a) grant application peer review or manuscript peer
review, (b) disciplinary (hard-soft) or subfi eld direc-
tion (general-special), and (c) acceptance or rejection
A number of investigations of peer review bias have
been done by funding organizations in various coun-
tries (e.g., National Institutes of Health [NIH] in the
United States, Canadian Institutes of Health [CIH]
in Canada, and Medical Research Council [MRC] in
Great Britain). According to an investigation by the
General Accounting Offi ce in the United States, peer
reviews were defi cient in the following respects: a
bias against younger applicants, women, and minori-
ties, unarticulated judgment criteria such as expecta-
tion of results, the Matthew effect, and halo effects
(see LaFollette, 1994; Mayo et al., 2006).
Bias stemming from researchers’ rank and
departmental/institutional status was reported by
Cole, Rubin, and Cole (1977). These fi ndings mean
that younger scientists and scientists from less pres-
tigious institutions are disadvantaged. Two related
problems that are particularly infl uential in peer re-
view of grant applications are the “old boys’ network”
and the Matthew effect, both of which favor estab-
lished and higher ranked scientists (Merton, 1973).
Moreover, Wennerås and Wold (1997) found in-
stances of sexism and nepotism in a study of a Swed-
ish funding organization in the medical sciences. In a
replication of this study 10 years later, the sexist bias
was reversed (men were disfavored), but nepotism
still prevailed (Sandström & Hällsten, 2008). More-
over, a recent Australian study reported no gender
bias across research areas in the sciences and social
sciences (Marsh et al., 2008).
In addition to these examples of bias, peer re-
view has been revealed to be defi cient because of its
rejection of research innovations. Ruderfer (1980)
described an example in his case history of an errone-
the borders of a scientifi c eld are not clearly defi ned
in these studies.
In summary, as these various studies indicate,
the peer review system appears somewhat unreli-
able. Whether reviewers agree or disagree, a number
of reviewers appear subject to biases, with the result
that some manuscripts are accepted and some grant
applications are awarded, not on their own merits but
according to the personal prejudices of the review-
ers. Of course, like people in general, reviewers may
also tend to be guided by stereotypes and prejudices
(Fiske & Taylor, 1991). In the next section, I exam-
ine the implications of peer review (dis)agreement
and suggest how to improve peer review in scientifi c
Two related questions are addressed in this conclud-
ing section. First, what does it mean for scientifi c
research if disagreement is as likely, if not more likely,
as agreement among reviewers who review a scientifi c
endeavor? Second, taking evidence of peer review bias
into consideration, how can peer review be improved?
In the following discussion, I assume that peer review
decisions are rationally and positively grounded in
sound reasoning. Nonrational decision making in peer
reviews should be fought by stricter peer review proce-
dures but is set aside here as invalid and negative.
Disagreement, frequent among reviewers in the
scientifi c community, may be a sign of weakness in
science. One could argue that this is more common
in the social sciences than in the natural sciences be-
cause disagreements are more frequent in the former
(Simonton, 2006). This could be explained by the
social sciences’ weak paradigms, which are character-
ized among other things by high journal rejection rates
and particularism in grant application reviews (Glick,
Miller, & Cardinal, 2007) and the connected idea of
a hierarchy of sciences where physical and other hard
sciences rank higher than all social sciences (Simon-
ton, 2006; Simonton, in press). If we cannot agree on
what good or bad science is, it appears that the criteria
for evaluating science are not absolute but relative. Or,
as is perhaps the case, scientists have different criteria
by which they evaluate science. With either explana-
tion, the result is disagreement in peer review.
Conversely, if reviewers were to agree in al-
most all cases, then the implication may be that the
standards of good science were clear and agreed on
and that reviewers were merely rule-followers. One
possible benefi t of such scientifi c standards would be
that all authors submitting manuscripts to journals
and making grant applications to funding organiza-
tions would know the expected criteria. However,
studies on editorial peer review where standardized
procedures and explicit criteria were used have re-
sulted in more agreements only to a limited extent
(Hemlin, 1996; Weller, 2001). Also, in another study
on grant applications, reviewers disagreed even when
the criteria were used to promote the review (Cole
et al., 1981). Perhaps reviewers would disagree de-
spite the most explicit criteria and most rigorous
procedures. As a consequence of this disagreement
among reviewers, scientists may spend many useless
hours on writing grant applications instead of doing
Ultimately, peer review is a human judgment
process that is far from perfect. We know from psy-
chological decision-making research that people are
sometimes nonrational and fail to follow rules (e.g.,
Tversky & Kahnemann, 1974). People are compli-
cated because of their use of a number of heuristics
and because of the infl uence on them by a number
of internal and external factors. Yet such nonmecha-
nistic rationality characterizing human judgment
may affect scientifi c research positively in the long
run because it diminishes completely streamlined
and predictable scientifi c results (Hemlin, 1996).
Essentially, scientifi c research, which is about the un-
known, should push the boundaries of the unknown
and unexplored.
The second question, posed above, concerns the
problem of peer review bias. A number of remedies
to improve peer review have been suggested in the
literature. First, it is clear that reviewers should be
changed frequently and should not be appointed
by the tools of nepotism and the old boys’ network
(Fuller, 1995; Weller, 2001). Second, quality crite-
ria in scientifi c research should be discussed and
changed occasionally to discourage lockstep rule-
following and unoriginal, mainstream research, par-
ticularly because we know that many scientists are
experts at meeting organizations’ funding criteria. It
is critical to keep the discussion alive on what repre-
sents good science. Third, as some authors propose,
it is necessary to use additional quality indicators
such as citations, impact factors, and h-index (i.e.,
the publication-to-citation ratio) to support the peer
review process (Shadish, 1989). Such a possibility ex-
ists, but as is often pointed out by scientometricians,
indicators are suitable only at the aggregate levels of
scientifi c research and not for individual scientists
(Cole & Cole, 1973).
The psychology of peer review in science is a subset
of the larger fi eld of the psychology of science (Feist,
2006; Gholson et al., 1989). Psychological factors,
cognitive, motivational, and social processes, infl u-
ence all stages of science, including its evaluation.
Disagreement in peer review will no doubt continue.
This situation has its drawbacks, but in fact it can
be benefi cial to scientifi c research. However, there is
benefi t in disagreement only if such disagreements in
the peer review system are based on rational grounds.
A complete agreement in peer review will never be
realized because scientists are not perfect decision-
makers or judges. Biases in peer review will never be
zero, but stricter peer review procedures initiated by
editors and grant committees will be needed. Moreover,
it is even not desirable to eliminate all disagreement be-
cause it would soon lead to streamlined research that
would stifl e scientifi c development. All other disagree-
ments in the peer review system based on nonrational
grounds must be fought against fi ercely.
In the end, there is still a need for systematic and
thorough research on peer review. Our knowledge in
this fi eld is still limited (e.g., less is done on grant
proposal review), fragmented (some disciplines are
not investigated much while others are), method-
ologically unsophisticated (e.g., poor statistics have
been applied), and poorly calibrated, which means
that coordinated research in this fi eld is rare. Finally,
much more psychological research could be done to
promote peer review studies and peer review as an
effective quality control in science.
Burnham, J. C. (1990). The evolution of editorial peer
review. Journal of American Medical Association, 263,
Burnham, J. C. (1992). How journal editors came to
develop and critique peer review procedures. In
H. F. Maylan and R. E. Sojka (Eds.), Research ethics,
manuscript review, and journal quality (pp. 55–61). Soil
Science Society of America, Crop Science Society of
America, and American Society of Agronomy, Madi-
son, WI: ACS Miscellaneous Publication.
Campanario, J. M. (1996). Have referees rejected some of
the most-cited articles of all times? Journal of the Ameri-
can Society for Information Science, 47 (4), 302–310.
Chubin, D. E., & Hackett, E. J. (1990). Peerless science: Peer
review and U.S. science policy. Albany: State University
of New York.
Cicchetti, D. V. (1991). The reliability of peer review for
manuscript and grant submissions: A cross- disciplinary
investigation. Behavioral and Brain Sciences, 14, 119–
Cole, J. R., & Cole, S. (1973). Citation analysis. Science,
183, 32–33.
Cole, S., Cole, J. R., & Simon, G. A. (1981). Chance and
consensus in peer review. Science, 214, 881–885.
Cole, S., Rubin, L., & Cole, J. R. (1977). Peer review and
the support of science. Scientifi c American, 237, 34–41.
Daniel, H.-D. (1993). Guardians of science: Fairness and re-
liability of peer review. Weinheim and New York: VCH.
Feist, G. (2006). The psychology of science and the origin of the
scientifi c mind. New Haven, CT: Yale University Press.
Fiske, S. T., & Pavelchak, M. A. (1986). Category-based
versus piecemeal-based affective responses: Develop-
ments in schema-triggered affect. In R. M. Sorrentino
& E. T. Higgins (Eds.), Handbook of motivation and
cognition: Foundations of social behavior (pp. 167–203).
New York: Guilford Press.
Fiske, S. T., & Taylor, S. E. (1991). Social cognition. New
York: McGraw-Hill.
Fuller, S. (1995). Cyberplatonism: An inadequate constitu-
tion for the republic of science. The Information Society,
11, 293–303.
Glick, W. H., Miller, C. C., & Cardinal, L. B. (2007). Ma-
king a life in the fi eld of organization science. Journal
of Organizational Behavior, 28, 817–835.
Hargens, L. L. (1988). Scholarly consensus and journal re-
jection rates. American Sociological Review, 53, 139–151.
Hargens, L. L. (1990). Variation in journal peer review
systems. Possible causes and consequences. Journal of
American Medical Association, 263 (10), 1348–1352.
Harnad, S. (1982). Peer commentary on peer review. The
Behavioral and Brain Sciences, 5,
Harnad, S. (1985). Rational disagreement in peer review.
Science, Technology & Human Values, 19 (3), 55–62.
Hemlin, S. (1991). Quality in science: Researchers´ concep-
tions and judgments. Unpublished doctoral dissertation,
Göteborg University.
Hemlin, S (1999). (Dis)agreement in peer review. In
P. Juslin & H. Montgomery (Eds.), Judgment and deci-
sion making: Neo-Brunswikian and process-tracing appro-
aches (pp. 275–301). Mahwah, NJ, London: Lawrence
Erlbaum Associates.
Hemlin, S., & Montgomery, H. (1990). Scientists’ concep-
tions of scientifi c quality: An interview study. Science
Studies, 1, 73–81.
Hemlin, S., Niemenmaa, P., & Montgomery, H. (1995).
Quality criteria in evaluations: Peer reviews of grant
applications in psychology. Science Studies, 8, 44–52.
Horrobin, D. F. (1990). The philosophical basis of peer re-
view and the suppression of innovation. Journal of the
American Medical Association, 263 (10), 1438–1441.
Judson, H. F. (1994, July). Structural transformations of the
sciences and the end of peer review. Paper presented at
the International Congress on Biomedical Peer Review
and Global Communications. Chicago, IL.
Kassirer, J. P., & Campion, E. W. (1994, July). Peer review:
Crude and understudied, but indispensable. Paper presen-
ted at the International Congress on Biomedical Peer
Review and Global Communications. Chicago, IL.
LaFollette, M. (1994). Measuring equity: The U.S. General
Accounting Offi ce study of peer review. Science Com-
munication, 16, 211–220.
Marsh, H. W., Jayasinghe, U. W., & Bond, N. W. (2008).
Improving the peer review process for grant applica-
tions: Reliability, validity, bias, and generalizability.
American Psychologist, 63 (3), 160–168.
Mayo, N. E., Brophy, J., Goldberg, M. S., Klein, M. B.,
Miller, S., Platt, R. W., et al. (2006). Peering at peer
review revealed high degree of chance associated with
funding of grant applications. Journal of Clinical Epide-
miology, 59 (8), 842–848.
Merton, R. K. (1973). The normative structure of science.
In N. W. Storer (Ed.), The sociology of science: Theoreti-
cal and empirical investigations (pp. 267–278). Chicago:
University of Chicago Press.
Millman, J. (1982). Making the plausible implausible: A
favorable review of Peters and Ceci’s target article. The
Behavioral and Brain Sciences, 5, 225–226.
Montgomery, H., & Hemlin, S. (1990). Cognitive studies
of scientifi c quality judgments: Studies of higher edu-
cation and research. Newsletter from the Council for Stu-
dies of Higher Education, 3, 15–21.
Niemenmaa, P., Montgomery, H., & Hemlin, S. (1995). Peer
review protocols of research applications in psychology II:
The applicants background and the decisions (Rep. No.
794). Stockholm: Stockholm University, Department
of Psychology.
Peters, D. P., & Ceci, S. J. (1982). Peer-review practices of psy-
chological journals: The fate of published articles, submit-
ted again. The Behavioral and Brain Sciences, 5, 187–195.
Ruderfer, M. (1980). The fallacy of peer review—judgment
without science and a case history. Speculations in Science
and Technology, 3 (5), 533–562.
Sandström, U., & Hällsten, M. (2008). Persistent nepotism
in peer-review. Scientometrics, 74 (2), 175–189.
Shadish, Jr., W. R. (1989). The perception and evaluation
of quality in science. In B. Gholson, W. R. Shadish Jr.,
R. A. Neimeyer, & A. C. Houts (Eds.), Psychology of
science: Contributions to metascience (pp. 383–426).
Cambridge: Cambridge University Press.
Sigelman, L., & Whicker, M. (1987). Some implications of
bias in peer review: A simulation-based analysis. Social
Science Quarterly, 68, 494–509.
Simonton, D. K. (2006). Scientifi c status of disciplines,
individuals, and ideas: Empirical analyses of the po-
tential impact of theory. Review of General Psychology,
10 (2), 98–112.
Simonton, D. K. (in press). Varieties of (scientifi c) creati-
vity: A hierarchical model of disposition, development,
and achievement. Perspectives on Psychological Science.
Slovic, P. (2001). Psychological study of human judgment:
Implications for investment decision making. Journal
of Behavioral Finance, 2 (3), 160–172.
Travis, G. D. L., & Collins, H. M. (1991). New light on
old boys: Cognitive and institutional particularism in
the peer review system. Science, Technology & Human
Values, 16, 322–341.
Tversky, A., & Kahnemann, D. (1974). Judgment under
uncertainty: Heuristics and biases. Science, 185, 1124–
Van Lange, P. A. M. (1996). Why reviewers are ( believed to
be) overly critical: A study of peer review. Unpublished
manuscript, Free University, Amsterdam.
Weller, A. C. (2001). Editorial peer review: Its strength and
weaknesses. Medford, NJ: Information Today.
Wennerås, C., & Wold, A. (1997, May 22). Nepotism and
sexism in peer review. Nature, 387, 341–343.
Correspondence regarding this article should be directed to
Sven Hemlin, PhD, Gothenburg Research Institute, School
of Business, Economics and Law, University of Gothen-
burg, P.O. Box 600, SE 405 30 Göteborg, Sweden. E-mail:
... Kuhn 1962). Judging a scientific proposal is further complicated by the fact that it concerns research that will be developed and will produce results in the future, and reviewers must rely on the limited information available in a research proposal to predict its future success (Hemlin 2009). ...
Full-text available
We have limited understanding of why reviewers tend to strongly disagree when scoring the same research proposal. Thus far, research that explored disagreement has focused on the characteristics of the proposal or the applicants, while ignoring the characteristics of the reviewers themselves. This article aims to address this gap by exploring which reviewer characteristics most affect disagreement among reviewers. We present hypotheses regarding the effect of a reviewer's level of experience in evaluating research proposals for a specific granting scheme, i.e., scheme reviewing experience. We test our hypotheses by studying two of the most important research funding programmes in the European Union from 2014-2018, namely 52,488 proposals evaluated under three funding schemes of the Horizon 2020 Marie Sklodowska-Curie Actions (MSCA), and 1,939 proposals evaluated under the COST Actions. We find that reviewing experience on previous calls of a specific scheme significantly reduces disagreement, while experience of evaluating proposals in other schemes-namely, general reviewing experience, does not have any effect. Moreover, in MSCA-Individual Fellowships, we observe an inverted U relationship between the number of proposals a reviewer evaluates in a given call and disagreement, with a remarkable decrease in disagreement above 13 evaluated proposals. Our results indicate that reviewing experience in a specific scheme improves reliability, curbing unwarranted disagreement by fine-tuning reviewers' evaluation.
... While standards of quality are controversial in all disciplines, recent research shows that perceptions and conceptualisation of excellence are even more complex and fuzzy in the SSH (see, e.g., Furlong & Oancea, 2005;Hemlin, 1996;Williams & Galleron, 2016). Also, while peer-review is generally universally acclaimed and accepted within this area, in many journals or publishing houses, as well as at other levels and institutions where evaluation is practiced by peers, procedures are far from being transparent and robust, and often have not been closely monitored or assessed against principles such as thoroughness and fairness ( Hemlin, 2009). Moreover, the relationship between science and society is changing and evaluation mechanisms are bound to reflect that to a certain extent. ...
... Table 1 contains a short summary of potential distortions that may occur during a reviewers selection, reviewing process, and final decision regarding a project or an article. The summary is based on the existing literature (Bornmann and Daniel, 2009, Eisenhart, 2002, Hemlin, 2009, Hojat and Rosenzweig, 2004, Jacoby et al., 1989, Langfeldt, 2004, Marsh et al., 2008, Rivara et al., 2007. ...
In this study, we propose the architecture of a content-based recommender system aimed at the selection of reviewers (experts) to evaluate research proposals or articles. We introduce a comprehensive algorithmic framework supported by various techniques of information retrieval. We propose a well-rounded methodology that explores concepts of data, information, knowledge, and relations between them to support a formation of a suitable recommendation. In particular, the developed system helps collecting data characterizing potential reviewers, retrieving information from relational and unstructured data, and formulating a set of recommendations. The designed system architecture is modular from the functional perspective and hierarchical from the technical point of view. Each essential part of the system is treated as a separate module, whereas each layer supports a certain functionality of the system. The modularity of the architecture facilitates its maintainability. The process of information retrieval includes classification of publications, author disambiguation, keywords extraction, and full-text indexing, whereas recommendations are based on the combination of a cosine similarity between keywords and a full-text index. The proposed system has been verified through a case study run at the National Center for Research and Development, Warsaw, Poland.
... Shouldn't then the many existing procedures without interviews be abandoned? 13 Or is it because other aspects of talent (such as communicative skills) and several cognitive, motivational and social processes (Lamont, 2009) play a role during the interview, as well as various psychological factors (Hemlin, 2009)? ...
Full-text available
Career grants are an important instrument for selecting and stimulating the next generation of leading researchers. Earlier research has mainly focused on the relation between past performance and success. In this study we investigate the evidence of talent and how the selection process takes place. More specifically, we investigate which quality dimensions (of the proposal, of the researcher and societal relevance) dominate, and how changes in weighing these criteria affect the talent selection. We also study which phases in the process (peer review, panel review, interview) are dominant in the evaluation process. Finally we look at the effect of the gender composition of the panel on the selection outcomes, an issue that has attracted quite some attention. Using a dataset of the scores of 897 career grant applications we found no clear 'boundaries of excellence', and only a few granted talents are identified as top talents based on outstanding reviews compared to the other applicants. Quite often, the scores applicants receive change after the interview, indicating the important role of that phase. The evaluation of talent can be considered to be contextual, as the rankings of applicants changed considerably during the procedure and reviewers used the evaluation scale in a relative way. Furthermore, talent was found to have different (low correlated) dimensions. Small changes of the weights of these dimensions do not influence the final outcomes much, but strong changes do. We also found that the external peer reviews hardly influence the decision-making. Finally, we found no gender bias in the decisions.
... Shouldn't the many existing procedures without interviews be abandoned? Or is it because other aspects of talent (such as communicative skills) and several cognitive, motivational and social processes (Lamont, 2009) play a role during the interview, as well as various psychological factors (Hemlin, 2009)? Thirdly, the role of the external peer review in the quality assessment seems modest (Langfeldt et al., 2010). ...
Full-text available
Career grants are an important instrument for selecting and stimulating the next generation of leading researchers. Earlier research has mainly focused on the relation between past performance and success. In this study we investigate how the selection process takes place. More specifically, we investigate which quality dimensions (of the proposal, of the researcher and societal relevance) dominate. We also study which phases in the process (peer review, committee review, interview) are dominant in the evaluation process. Finally, we investigate whether differences between disciplines are visible. The analysis of our data set, consisting of the reviews of 898 grant applications, shows that talent has different dimensions and therefore is not obvious. The evaluation of talent was found to be contextual, although there were only small differences between disciplines. Unlike the interviews with the applicants, the external peer reviews hardly influence the decision-making on grant allocation. The notion of talent was found to be the least evident in the social sciences and humanities.
... 'Peer consensus' is often believed to be an indicator of 'inter-rater reliability', and is typically regarded as the most valuable collective product of panel deliberation (see Brenneis, 1994;Cicchetti, 1991;Hemlin, 2009;Marsh et al., 2008). It indeed results in a clear signal for funding decisions. ...
Full-text available
This paper analyses peer review deliberations in four evaluation panels that differ in terms of scope and disciplinary heterogeneity. Based on evaluation reports and discussions with panel members, it illustrates a variety of ways in which reviewers bridge their epistemological differences and achieve consensus on the quality of research proposals. The analysis demonstrates that peer review panels are forums in which communication across disciplinary boundaries occurs and interdisciplinary judgments arise. At the same time, disciplinary gate-keeping and incommensurabilities may set limits on such communication. The comparison of deliberative processes sheds light on how collective judgments are shaped and constrained by the disciplinary set-up of the panels in which the reviewers operate and in which the intersubjective dynamics of the deliberations unfold. Based on these findings, the paper considers conditions that may enhance disciplinary interaction and complementary judgments in the peer review of proposals, and thereby the prospects for interdisciplinary research.
Since 2006, the psychology of science has become an established discipline. The year 2006 saw the first international conference, from which the International Society for the Psychology of Science and Technology was launched. The following year, the first peerreviewed journal was started, the Journal of Psychology of Science and Technology. The society and journal are still relatively small and young. The question arises, where next? To survive and thrive, graduate training programs, federally funded grants, research centers, undergraduate and graduate courses, and degrees are needed. The society has been working on proposals for training grants and other graduate-student-oriented initiatives, such as awards and scholarships for the best research projects on the psychology of science. This chapter reviews the history of the field and describes some of the initiatives being undertaken to ensure its healthy maturation in the future.
Full-text available
(In Polish) Projekty naukowe i rozwojowe w Polsce finansowane są głównie ze środków budżetowych, unijnych i innych funduszy celowych. Gremia przyznające pieniądze napotykają dwa zasadnicze problemy. Pierwszym jest określenie zasadności merytorycznej finasowania danych projektów, natomiast drugim – wybranie przedsięwzięć najlepszych spośród wielu, często równie dobrych, w sytuacji ograniczenia środków pieniężnych. Podstawowym narzędziem stosowanym do oceny merytorycznej projektów są recenzje sporządzane przez przedstawicieli świata nauki oraz ekspertów z sektora komercyjnego. Recenzja często w sposób decydujący warunkuje sfinansowanie danego projektu. Wybór najlepszych projektów leży w interesie państwa, ponieważ umiejętnie wydane środki mogą znacząco wpłynąć na poprawę konkurencyjności gospodarki i rozwój ekonomiczny kraju, a nawet na poczucie praworządności i sprawiedliwości społecznej. Stąd tak istotna jest niezależność osób opiniujących, ich odpowiednie dopasowanie do problematyki projektu oraz brak konfliktu interesów. W ten sposób dochodzimy do kwestii odpowiedniego doboru recenzentów i ekspertów oraz recenzowania projektów badawczych, rozwojowych i artykułów naukowych. Istotność tego problemu skłoniła nasz zespół do podjęcia się opracowania systemu wspomagającego dobór recenzentów . W laboratorium inteligentnych systemów informatycznych Ośrodka Przetwarzania Informacji – Instytutu Badawczego opracowano dwa raporty analityczne pod tytułem Analiza porównawcza stosowanych metod doboru recenzentów i Analiza porównawcza narzędzi informatycznych wspomagających dobór recenzentów ; wykonano również projekt systemu . Efekty prac analitycznych i projektowych okazały się na tyle ciekawe, że postanowiliśmy zaprezentować je w formie książki. Celem publikacji jest zapoznanie czytelników z istniejącą na świecie i w Polsce metodologią procesu recenzowania oraz metodami doboru recenzentów. Krytyczna analiza tychże powinna dać przesłanki do budowy systemu wspomagania wyboru recenzentów. Rozdział pierwszy wprowadza czytelników w problematykę recenzowania. Zdefiniowanie podstawowych pojęć takich, jak „publikacja naukowa”, „recenzent” czy „proces recenzowania” ułatwi swobodne poruszanie się w omawianej tematyce. Zwrócono także uwagę na zagadnienie heurystyk zniekształceń poznawczych mogących wystąpić w procesie recenzowania projektów lub artykułów naukowych. Scharakteryzowano poszczególne typy heurystyk: dostępności, reprezentatywności i zakotwiczenia. Następnie szczegółowo omówiono rodzaje błędów popełnianych podczas recenzowania, które mogą mieć związek ze stosowaniem owych heurystyk. Przegląd procedur recenzowania stosowanych podczas oceny wniosków o granty w wybranych krajach zawiera rozdział drugi. Zanalizowano przykłady europejskie (programy ramowe, ERC, SFB, Szwajcarska Fundacja Nauki, ocena instytucji we Włoszech), amerykańskie (Instytut Nauk Edukacyjnych, Narodowa Fundacja Nauki, Narodowe Instytuty Zdrowia), australijskie (ARC). Przegląd podsumowuje krótkie porównanie. Rozdział trzeci, analogicznie do poprzedniego poświęcono procedurom recenzowania, ale stosowanym w czasopismach naukowych. Zbadano wybrane periodyki, uznawane za czołowe na świecie, jak „Science” czy „Nature”. Przedstawiona została także idea tzw. open peer review i jej implementacja w wybranych magazynach. Także tutaj zwieńczeniem rozdziału jest analiza porównawcza. OPI ma bogate doświadczenie w obsłudze procesu recenzowania programów granatowych w Polsce. Na podstawie stosowanych w Instytucie rozwiązań opracowano zatem analizę metod doboru recenzentów i procedur recenzowania; zamieszczono ją w rozdziale czwartym. Badaniu poddano następujące fundusze: Obsługa Strumieni Finansowania Nauki, Program Operacyjny Innowacyjna Gospodarka, Polsko-Norweski Fundusz Badań Naukowych i Polsko-Szwajcarski Program Badawczy. W ramach podsumowania zaprezentowano wnioski dotyczące różnych metod doboru recenzentów, zarówno polskich, jak i zagranicznych. Należy podkreślić, że metody te zostały sprawdzone w praktyce. W rozdziale piątym znaleźć można statystyki recenzentów i recenzji dla wymienionych w poprzednim rozdziale programów finansowania. Następnie zanalizowano wyniki ankiety, w której recenzenci i wnioskodawcy mogli wypowiedzieć się na temat systemów informatycznych OPI oraz ocenić proces recenzowania. Przeprowadzono analizę ilościową kwestionariusza, a także analizę jakościową wypowiedzi ankietowanych osób. Podsumowaniem tomu pierwszego są wnioski końcowe. Zamieszczono tam także porównanie doboru recenzentów przez człowieka i za pomocą systemu informatycznego oraz sformułowano zalecenia do budowy systemu informatycznego. Szczegółowy projekt systemu i jego implementacja zostaną przedstawione w tomie drugim niniejszego opracowania. Full text:
Full-text available
Throughout the world, women leave their academic careers to a far greater extent than their male colleagues. (1) In Sweden, for example, women are awarded 44 per cent of biomedical PhDs but hold a mere 25 per cent of the postdoctoral positions and only 7 per cent of professorial positions. It used to be thought that once there were enough entry-level female scientists, the male domination of the upper echelons of academic research would automatically diminish. But this has not happened in the biomedical field, where disproportionate numbers of men still hold higher academic positions, despite the significant numbers of women who have entered this research field since the 1970s.
Originally published in 1989, this book offers a comprehensive overview of the work of scholars in several different disciplines contributing to the development of the psychology of science: the systematic elaboration and application of psychological concepts and methods to clarify the nature of the scientific enterprise. The psychology of science of course overlaps in important ways with the philosophy, history, and sociology of science, but its predominant and distinctive focus is on individuals and small groups employing concepts elucidated via experimental methods.
Editorial peer-review procedures did not develop to detect fraud or even, originally, to establish the standards and authority of science. Peer reviewing evolved from the need of editors to choose among a surplus of submitted manuscripts and the growing inability of an editor to possess enough expertise to judge quality in all specialized fields that a journal might cover. Referring papers out began as early as the eighteenth century in some forms, but the practice was quite unusual until the twentieth century. Each journal came to the practice in a unique way, and occasional bits of evidence show how journals in the agronomy and agriculture fields are excellent examples of the variety of practices that developed among all scientific journals so that refereeing of some kind was commonplace by the mid-twentieth century. In the 1970s and 1980s, following some sociological investigation of editorial practices, journal editors began to question and critique peer reviewing. By the mid-1980s, general public and legislative concern over grant peer reviewing had intensified concern about wholly independent and various refereeing practices that were grouped together as editorial peer review. Please view the pdf by using the Full Text (PDF) link under 'View' to the left. Copyright © 1992. . Copyright © 1992 Soil Science Society of America, Crop Science Society of America, and American Society of Agronomy, 5585 Guilford Rd., Madison, WI 53711 USA
In this book, Gregory Feist reviews and consolidates the scattered literatures on the psychology of science, then calls for the establishment of the field as a unique discipline. He offers the most comprehensive perspective yet on how science came to be possible in our species and on the important role of psychological forces in an individual's development of scientific interest, talent, and creativity. Without a psychological perspective, Feist argues, we cannot fully understand the development of scientific thinking or scientific genius. The author explores the major subdisciplines within psychology as well as allied areas, including biological neuroscience and developmental, cognitive, personality, and social psychology, to show how each sheds light on how scientific thinking, interest, and talent arise. He assesses which elements of scientific thinking have their origin in evolved mental mechanisms and considers how humans may have developed the highly sophisticated scientific fields we know today. In his fascinating and authoritative book, Feist deals thoughtfully with the mysteries of the human mind and convincingly argues that the creation of the psychology of science as a distinct discipline is essential to deeper understanding of human thought processes.
Many decisions are based on beliefs concerning the likelihood of uncertain events such as the outcome of an election, the guilt of a defendant, or the future value of the dollar. Occasionally, beliefs concerning uncertain events are expressed in numerical form as odds or subjective probabilities. In general, the heuristics are quite useful, but sometimes they lead to severe and systematic errors. The subjective assessment of probability resembles the subjective assessment of physical quantities such as distance or size. These judgments are all based on data of limited validity, which are processed according to heuristic rules. However, the reliance on this rule leads to systematic errors in the estimation of distance. This chapter describes three heuristics that are employed in making judgments under uncertainty. The first is representativeness, which is usually employed when people are asked to judge the probability that an object or event belongs to a class or event. The second is the availability of instances or scenarios, which is often employed when people are asked to assess the frequency of a class or the plausibility of a particular development, and the third is adjustment from an anchor, which is usually employed in numerical prediction when a relevant value is available.