The Consensual Assessment Technique (CAT)-more generally, using product creativity judgments-is a central and actively debated method to assess product and individual creativity. Despite a constant interest in strategies to improve its robustness, we argue that most psychometric investigations and scoring strategies for CAT data remain constrained by a flawed psychometrical framework. We first describe how our traditional statistical account of multiple judgments, which largely revolves around Cronbach's and sum/average scores, poses conceptual and practical problems-such as misestimating the construct of interest, misestimating reliability and structural validity, underusing latent variable models, and reducing judge characteristics as a source of error-that are largely imputable to the influence of classical test theory. Then, we propose that the item-response theory framework, traditionally used for multi-item situations, be transposed to multiple-judge CAT situations in Judge Response Theory (JRT). After defining JRT, we present its multiple advantages, such as accounting for differences in individual judgment as a psychological process-rather than as random error-giving a more accurate account of the reliability and structural validity of CAT data and allowing the selection of complementary not redundant-judges. The comparison of models and their availability in statistical packages are notably discussed as further directions.