A preview of this full-text is provided by American Psychological Association.
Content available from Psychology of Aesthetics Creativity and the Arts
This content is subject to copyright. Terms and conditions apply.
Judge Response Theory? A Call to Upgrade Our Psychometrical Account
of Creativity Judgments
Nils Myszkowski
Pace University
Martin Storme
Université Paris Descartes
The Consensual Assessment Technique (CAT)—more generally, using product creativity judgments—is
a central and actively debated method to assess product and individual creativity. Despite a constant
interest in strategies to improve its robustness, we argue that most psychometric investigations and
scoring strategies for CAT data remain constrained by a flawed psychometrical framework. We first
describe how our traditional statistical account of multiple judgments, which largely revolves around
Cronbach’s ␣and sum/average scores, poses conceptual and practical problems—such as misestimating
the construct of interest, misestimating reliability and structural validity, underusing latent variable
models, and reducing judge characteristics as a source of error—that are largely imputable to the
influence of classical test theory. Then, we propose that the item–response theory framework, tradition-
ally used for multi-item situations, be transposed to multiple-judge CAT situations in Judge Response
Theory (JRT). After defining JRT, we present its multiple advantages, such as accounting for differences
in individual judgment as a psychological process—rather than as random error— giving a more accurate
account of the reliability and structural validity of CAT data and allowing the selection of complemen-
tary—not redundant—judges. The comparison of models and their availability in statistical packages are
notably discussed as further directions.
Keywords: classical test theory, item–response theory, consensual assessment technique, creativity
judgment, creativity assessment
Although various methods have been imagined to assess cre-
ativity, a substantial amount of research relies on Amabile’s (1982)
Consensual Assessment Technique (CAT), which consists of ask-
ing experts to evaluate creative products (Baer & McKool, 2009).
Extensive research has provided a set of methodological guidelines
on how to best collect accurate judgments of creative products.
However, these methodological recommendations are often about
how to better prepare (e.g., Storme, Myszkowski, Çelik, & Lubart,
2014) or select judges (e.g., Kaufman, Baer, Cole, & Sexton,
2008). In contrast, there have been much fewer investigations
regarding how to examine the robustness of CAT data or how to
obtain accurate composite scores for the measured attribute.
To examine the robustness of CAT data and derive composite
scores for the attribute, researchers generally respectively compute
Cronbach’s ␣across judges and sum (or average) scores to aggre-
gate judgments into a single score (Baer & McKool, 2009). There
have been uses of latent variable models of judgment data (e.g.,
Myszkowski & Storme, 2017;Silvia et al., 2008) and discussions
on how to investigate CAT data (e.g., Stefanic & Randles, 2015),
but the general measurement framework to adopt to investigate the
psychometric properties of CAT data and to obtain composite has
not yet been discussed.
In this article, we discuss the typical psychometric investiga-
tions of CAT and creativity judgments, as well as describe the
recurring challenges encountered. We trace them back to the
underlying framework of Classical Test Theory (CTT) and subse-
quently present the framework of Item–Response Theory (IRT) as
a more coherent and useful approach to CAT data.
The Limitations of Our Current Psychometrical
Practice
While the CAT is an important advance in the measurement of
product creativity, the employed statistical techniques that are
commonly used, in both psychometric investigations and scoring
strategies, result in critical challenges. In this section, we want to
point to the main ones.
The Issues of Sum/Average Scoring
Typically, to aggregate the scores of judges in CAT and thus
estimate a product’s creativity—in other words, to achieve its
measurement—researchers compute sums/averages across judg-
Nils Myszkowski, Department of Psychology, Pace University; Martin
Storme, Laboratoire Adaptations Travail-Individu, Université Paris Des-
cartes.
Correspondence concerning this article should be addressed to Nils
Myszkowski, Department of Psychology, Pace University, Room 1315, 41
Park Row, New York, NY 10038. E-mail: nmyszkowski@pace.edu
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Psychology of Aesthetics, Creativity, and the Arts
© 2019 American Psychological Association 2019, Vol. 13, No. 2, 167–175
1931-3896/19/$12.00 http://dx.doi.org/10.1037/aca0000225
167