Article

Item response modeling of divergent-thinking tasks: A comparison of Rasch’s Poisson model with a two-dimensional model extension

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Item-response theory (IRT) models are test-theoretical models with many practical implications for educational measurement. For example, test-linking procedures and large-scale educational studies often build on IRT frameworks. However, IRT models have been rarely applied to divergent thinking which is one of the most important indicators of creative potential. This is most likely due to the fact that the best-known models, such as the one-parameter logistic Rasch model, can be only used for binary data. But its less known, and often overlooked, predecessor, the Rasch Poisson count model (RPCM), is well suited to model many important divergent-thinking outcomes such as fluency. In the current study we assessed RPCM fit to four different divergent thinking tasks. We further assessed the fit of the data to a two-dimensional variant of the RPCM to take into account construct differences due to verbal and figural task modality. We also compared estimated measurement precision based on the two-dimensional model, two separately estimated modality-specific unidimensional models, and a classic approach. The results indicated that the two-dimensional approach was advantageous, especially when correlations of latent variables are of interest. The RPCM and its more flexible multidimensional variants are discussed as a psychometric tool which possibly directs future research towards a better understanding of all the available divergent-thinking tasks.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... We aim to fill this gap with the proposal of two new explanatory count IRT models: one model for the item-side, and one model for the person-side. 2009; or extended to a bi-or multi-dimensional model; Forthmann et al., 2018;Myszkowski & Storme, 2021;Wedel et al., 2003), while others generalized the RPCM to allow for overdispersed conditional responses (i.e., the conditional variance exceeds the conditional mean; e.g., Hung, 2012;Mutz & Daniel, 2018). Underdispersed conditional responses (i.e., the conditional variance is smaller than the conditional mean) were unaccounted for by count IRT models for a long time, despite empirical evidence (Doebler & Holling, 2016;Forthmann & Doebler, 2021;Forthmann, G€ uhne, et al., 2020) and associated underestimation of model-implied reliability (Forthmann, G€ uhne, et al., 2020). ...
Article
In psychology and education, tests (e.g., reading tests) and self-reports (e.g., clinical questionnaires) generate counts, but corresponding Item Response Theory (IRT) methods are underdeveloped compared to binary data. Recent advances include the Two-Parameter Conway-Maxwell-Poisson model (2PCMPM), generalizing Rasch’s Poisson Counts Model, with item-specific difficulty, discrimination, and dispersion parameters. Explaining differences in model parameters informs item construction and selection but has received little attention. We introduce two 2PCMPM-based explanatory count IRT models: The Distributional Regression Test Model for item covariates, and the Count Latent Regression Model for (categorical) person covariates. Estimation methods are provided and satisfactory statistical properties are observed in simulations. Two examples illustrate how the models help understand tests and underlying constructs.
... Some subsequent work extended the RPCM while retaining the equidispersion assumption (Jansen, 1994;Jansen & van Duijn, 1992;Verhelst & Kamphuis, 2009), while others generalized the RPCM to allow for overdispersed conditional responses (i.e., the conditional variance exceeds the conditional mean; e.g., Hung, 2012;Mutz & Daniel, 2018). Other authors studied two-dimensional or multidimensional latent variables (Wedel, Böckenholt, & Kamakura, 2003;Forthmann, Çelik, Holling, Storme, & Lubart, 2018;Myszkowski & Storme, 2021) or replaced log-linearity with a sigmoid link function (Doebler, Doebler, & Holling, 2014). But for a long time, underdispersed conditional responses (i.e., the conditional variance is smaller than the conditional mean) could not be accounted for with count IRT models, despite empirical evidence, especially from real test data with highly structured test materials (Doebler & Holling, 2016;Forthmann, Gühne, & Doebler, 2020;Forthmann & Doebler, 2021). ...
Preprint
In psychology and education, tests (e.g., reading tests) and self-reports (e.g., clinical questionnaires) generate counts, but corresponding Item Response Theory (IRT) methods are underdeveloped compared to binary data. Recent advances include the Two-Parameter Conway-Maxwell-Poisson model (2PCMPM), generalizing Rasch’s Poisson Counts Model, with item-specific difficulty, discrimination, and dispersion parameters. Explaining differences in model parameters informs item construction and selection, but has received little attention. We derive the item information in the 2PCMPM and introduce two 2PCMPM based explanatory count IRT models: The Distributional Regression Test Model for item covariates, and the Count Latent Regression Model for person covariates. Estimation methods are provided and satisfactory statistical properties observed in simulations. Two examples illustrate how the models help understanding tests and underlying constructs.
... In the psychological literature, the RPCM was used to scale many different cognitive abilities such as reading competency (Jansen 1995;Jansen and van Duijn 1992;Rasch 1960), intelligence (Ogasawara 1996), mental speed (Baghaei et al. 2019;Doebler and Holling 2016;Holling et al. 2015), or divergent thinking (Forthmann et al. 2016(Forthmann et al. , 2018. The model was further used for migraine attacks (Fischer 1987) or sports exercises such as sit-ups (Zhu and Safrit 1993). ...
Article
Full-text available
Item-response models from the psychometric literature have been proposed for the estimation of researcher capacity. Canonical items that can be incorporated in such models to reflect researcher performance are count data (e.g., number of publications, number of citations). Count data can be modeled by Rasch's Poisson counts model that assumes equidispersion (i.e., mean and variance must coincide). However, the mean can be larger as compared to the variance (i.e., underdispersion), or b) smaller as compared to the variance (i.e., overdispersion). Ignoring the presence of overdispersion (underdispersion) can cause standard errors to be liberal (conservative), when the Poisson model is used. Indeed, number of publications or number of citations are known to display overdispersion. Underdispersion, however, is far less acknowledged in the literature. In the current investigation the flexible Conway-Maxwell-Poisson count model is used to examine reliability estimates of capacity in relation to various dispersion patterns. It is shown, that reliability of capacity estimates of inventors drops from .84 (Poisson) to .68 (Conway-Maxwell-Poisson) or .69 (negative binomial). Moreover, with some items displaying overdispersion and some items displaying underdispersion, the dispersion pattern in a reanalysis of Mutz and Daniel's (2018) researcher data was found to be more complex as compared to previous results. To conclude, a careful examination of competing models including the Conway-Maxwell-Poisson count model should be undertaken prior to any evaluation and interpretation of capacity reliability. Moreover, this work shows that count data psychometric models are well suited for decisions with a focus on top researchers, because conditional reliability estimates (i.e., reliability depending on the level of capacity) were highest for the best researchers.
... In contrast with models for binary, ordinal, and normally distributed responses, for which a wealth of item response models and extensions have been developed (Shao, Janse, Visser, & Meyer, 2014), the leading response model used for fluency scores for modeling fluency scores remains (Baghaei & Doebler, 2019;Forthmann, Celik, Holling, Storme, & Lubart, 2018;Forthmann et al., 2016) the Rasch Poisson Counts Model (RPCM; Rasch, 1960). However, because the RPCM is a particular case of the 2-Parameter Poisson Counts Model (2PPCM)-the model which we propose as a better alternative-we will first introduce the 2PPCM for clarity. ...
Article
Fluency tasks are among the most common item formats for the assessment of certain cognitive abilities, such as verbal fluency or divergent thinking. A typical approach to the psychometric modeling of such tasks (e.g., Intelligence, 2016, 57, 25) is the Rasch Poisson Counts Model (RPCM; Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research, 1960), in which, similarly to the assumption of (essential) ‐equivalence in Classical Test Theory, tasks have equal discriminations—meaning that, beyond varying in difficulty, they do not vary in how strongly they are related to the latent variable. In this research, we question this assumption in the case of divergent thinking tasks, and propose instead to use a more flexible 2‐Parameter Poisson Counts Model (2PPCM), which allows to characterize tasks by both difficulty and discrimination. We further propose a Bifactor 2PPCM (B2PPCM) to account for local dependencies (i.e., specific/nuisance factors) emerging from tasks sharing similarities (e.g., similar prompts and domains). We reanalyze a divergent thinking dataset (Psychology of Aesthetics, Creativity, and the Arts, 2008, 2, 68) and find the B2PPCM to significantly outperform the 2PPCM, both outperforming the RPCM. Further extensions and applications of these models are discussed.
... With some exceptions (Akbari Chermahini, Hickendorff, & Hommel, 2012;Barbot, Tan, Randi, Santa-Donato, & Grigorenko, 2012;Forthmann, Celik, Holling, Storme, & Lubart, 2018;Forthmann et al., 2016;Myszkowski, 2019;Myszkowski & Storme, 2017;Sen, 2016;Silvia et al., 2008;Wang, Ho, Cheng, & Cheng, 2014), IRT procedures are rarely used in creativity research. Although the reason for this is unclear, it was advanced (Myszkowski & Storme, 2019) that a central reason behind the underuse of such models in creativity research, like in psychological research in general, is that training in IRT is rarely found in Psychology education, and that IRT modelling software is often less easily available than traditional CTT applications (Borsboom, 2006). ...
Article
Full-text available
Although the Consensual Assessment Technique (CAT; Amabile, 1982) is considered a gold standard in the measurement of product attributes-including creativity (Baer & McKool, 2009)-considerations on how to improve its scoring and psychometric modeling are rare. Recently, it was advanced (Myszkowski & Storme, 2019) that the framework of Item-Response Theory (IRT) is appropriate for CAT data, and would provide several practical and conceptual benefits to both the psychometric investigation of the CAT and the scoring of creativity. However, the packages recommended for IRT modeling of ordinal data are hardly accessible for researchers non-familiar with IRT, or offer minimal possibility for adaptation of outputs to judgment data. Thus, the package "jrt" was developed for the open source programming language R, and available on the Comprehensive R-Archive Network (CRAN). Its main aim is to make IRT analyses easily applicable to CAT data, by automating model selections, by diagnosing and dealing with issues related to model-data incompatibilities, by providing quick, customizable and publication-ready outputs for communication, and by guiding researchers new to IRT as to the different methods available. We provide brief tutorials and examples for the main functions-which are further detailed in the online vignette and documentation on CRAN. We finally discuss the current limitations and anticipated extensions of the "jrt" package, and invite researchers to take advantage of its practicality.
... More recent studies of Doebler and Holling (2016) and Holling et al. (2015) applied Poisson models to processing speed tests. In addition, Forthmann et al. (2016) modeled divergent thinking data with an extension of the RPCM with item-covariates (see also Graßhoff, Holling, & Schwabe, 2013), and Forthmann, Çelik, Holling, Storme, and Lubart (2018) used a two-dimensional RPCM to account for varying stimulus domains in divergent thinking tasks. Thus, the RPCM and count data IRT models in general have a wide field of useful applications (see also additional examples given at the beginning of the article). ...
Article
Count data naturally arise in several areas of cognitive ability testing, e.g., processing speed, memory, verbal fluency, and divergent thinking. Contemporary count data item response theory models, however, are not flexible enough, especially to account for overand underdispersion at the same time. For example, the Rasch Poisson counts model assumes equidispersion (conditional mean and variance coincide) which is often violated in empirical data. This work introduces the Conway-Maxwell-Poisson counts model that can handle underdispersion (variance lower than the mean), equidispersion, and overdispersion (variance larger than the mean) in general and specifically at the item level. A simulation study revealed satisfactory parameter recovery at moderate sample sizes and mostly unbiased standard errors for the proposed estimation approach. In addition, plausible empirical reliability estimates resulted, while those based on the Rasch Poisson counts model were biased downwards (underdispersion) and biased upwards (overdispersion) when the simulation model deviated from equidispersion. Finally, verbal fluency data were analyzed and the Conway-Maxwell-Poisson counts model with item-specific dispersion parameters fit the data best. Dispersion parameter estimates indicated underdispersion for three out of four items. Overall, these findings indicate the feasibility and importance of the suggested flexible count data modeling approach.
Article
Several psychometric tests and self-reports generate count data (e.g., divergent thinking tasks). The most prominent count data item response theory model, the Rasch Poisson Counts Model (RPCM), is limited in applicability by two restrictive assumptions: equal item discriminations and equidispersion (conditional mean equal to conditional variance). Violations of these assumptions lead to impaired reliability and standard error estimates. Previous work generalized the RPCM but maintained some limitations. The two-parameter Poisson counts model allows for varying discriminations but retains the equidispersion assumption. The Conway-Maxwell-Poisson Counts Model allows for modelling over- and underdispersion (conditional mean less than and greater than conditional variance, respectively) but still assumes constant discriminations. The present work introduces the Two-Parameter Conway-Maxwell-Poisson (2PCMP) model which generalizes these three models to allow for varying discriminations and dispersions within one model, helping to better accommodate data from count data tests and self-reports. A marginal maximum likelihood method based on the EM algorithm is derived. An implementation of the 2PCMP model in R and C++ is provided. Two simulation studies examine the model's statistical properties and compare the 2PCMP model to established models. Data from divergent thinking tasks are reanalysed with the 2PCMP model to illustrate the model's flexibility and ability to test assumptions of special cases.
Article
Creativity has been well studied in the past several decades, and numerous measures have been developed to assess creativity. However, validity evidence associated with each measure is often mixed. In particular, the social consequence aspect of validity has received little attention. This is partly due to the difficulty of testing for differential item functioning (DIF) within the traditional classical test theory framework, which still remains the most popular approach to assessing creativity. Hence, this study provides an example of examining differential item functioning using multilevel explanatory item response theory models. The Creative Thinking Scale was tested for DIF in a sample of 1043 10th–12th graders. Results revealed significant uniform and non‐uniform DIF for some items. Differentially functioning items are able to produce measurement bias and should be either deleted or modeled. The detailed implications for researchers and practitioners are discussed.
Book
Full-text available
The theory and application of Many-Facet Rasch Measurement to judged (rated or rank-ordered) performances, and description of the estimation of the MFRM Rasch measures focusing on missing data.
Article
Full-text available
This study provides new evidence concerning task specificity in creativity – examining through a cross-cultural perspective the extent to which performance in graphic vs. verbal creativity tasks (domain specificity) and in divergent vs. convergent creativity tasks (process specificity) are correlated. The relations between different creativity tasks in monocultural and multicultural samples of Chinese and French children were compared. Electronic versions of the Wallach and Kogan Creativity Test (WKCT, Wallach & Kogan, 1965; Lau & Cheung, 2010) and the Evaluation of Potential Creativity (EPoC, Lubart, Besançon & Barbot, 2011; Barbot, Besançon & Lubart, 2011) were used. Both measures showed satisfactory psychometric properties and cross-cultural structural validity. The results showed that culture has an impact on the structure of creative ability: It appeared that correlation patterns were different across Chinese and French groups and across monocultural and multicultural groups. Such results show that it is crucial to take task specificity into account when investigating the effect of culture on creativity. Indeed, our study implies that cultural differences that are found using one specific creativity task might not be automatically generalizable to all sorts of creativity tasks. Limitations are discussed and perspectives for future research on culture and task specificity in creativity are proposed.
Article
Full-text available
This article presents a new method for the assessment of creativity in tasks such as "The camel is ... of the desert." More specifically, the study uses Tourangeau and Sternberg's (1981) domain interaction model to produce an objective system for scoring metaphors produced by raters and the many-facet Rasch measurement to model the rating scale structure of the scoring points, item difficulty, and rater severity analysis, thus making it possible to have equated latent scores for subjects, regardless of rater severity. This study also investigates 4 aspects of the method: reliability, correlation between quality and quantity, criterion validity, and correlation with fluid intelligence. The database analyzed in this study consists of 12,418 responses to 9 items that were given by 975 persons. Two to 10 raters scored the quality and flexibility of each metaphor on a 4-point scale. Raters were counterbalanced in a judge-linking network to permit the equating of different "test forms" implied in combinations of raters. The reliability of subjects' latent quality scores was .88, and the correlation between quality and quantity was low (r = -.14), thus showing the desired separation between the 2 parameters established for the task scores. The latent score on the test was significantly associated with the profession that requires idea production (r = .19), and the latent scores for the correlation between creativity and fluid intelligence were high, beta = .51, even after controlling for crystalized intelligence (r = .47). Mechanisms of fluid intelligence, executive function, and creativity are discussed.
Article
Full-text available
The main purpose of this study was to study the structure of creative thinking of students in visual and verbal areas. The Torrance Test of Creative Thinking Figural and Verbal forms were used. The participants were Turkish elementary school 7th grade students (M = 13 years, range = 12-14 years). The findings indicated that the relationship between visual and verbal areas of creative thinking of students were statistically significant and meaningful. Additionally, gender differences were statistically significant and meaningful. The results indicate that the structure of creative thinking of students is holistic in early puberty age, at least in terms of visual and verbal areas of creative thinking. However, the holistic structure of students’ creative thinking has flexible character regarding to low (r = .25) level of relationship between visual and verbal areas. This result suggests that the visual and verbal materials can be used with presentation and narrative techniques in balanced education activities for the development of students’ creative thinking effectively.
Article
Full-text available
C-Test is a variation of cloze test where the second half of every second word is deleted. The number of words correctly reconstructed by the test taker is considered to be a measure of general language proficiency. In this pilot study the componential structure of an English C-Test consisting of two spoken-discourse passages and two written-discourse passages is investigated with the help of both unidimensional and multidimensional Rasch model. In a sample of 99 fairly advanced Iranian students of English the data fitted better the multidimensional partial credit model as defined in multidimensional random coefficients multinomial logit model (Adams, Wilson, & Wang, 1997) than Masters’ (1982) unidimensional partial credit model. This indicates that spoken-discourse and written-discourse C-Test passages form distinct dimensions. We argue that spoken-discourse C-Test texts may tap better into students’ listening/speaking skills than C-Test based solely on written discourse texts and that therefore C-Tests consisting of both conversational and written-discourse passages can more adequately operationalize the construct of general language proficiency than C-Tests containing only written discourse passages. Considering the small sample size of the study the findings should be interpreted cautiously.
Article
Full-text available
The Remote Associates Test (RAT) is often assumed to be a measure of creativity; however, the RAT has been broadly applied in psychological studies. Originally developed to assess individual differences in associative processing, the RAT has been used to study various constructs, such as creativity, problem solving, insight, and memory. Aside from early validation studies, the psychometric properties of the RAT remain largely unexplored. This study examines the internal and external structure validity evidence of a computer-based, 30-item RAT based on scores from a sample of undergraduate students. We examined internal structure via classical test theory item statistics, dimensionality analysis, item response theory analysis, and differential item functioning analysis. Results showed that the two-parameter logistic (2PL) model, in which items have unique discrimination and difficulty parameters, had good fit to item responses from our 30-item RAT. In addition, the relationships among scores on the RAT and a series of other cognitive measures including divergent thinking, intelligence, and working memory tasks were examined to assess the external validity of the RAT scores. Results indicate that the RAT assesses cognitive processes similar to those from a wide range of other analytical and convergent thinking test, distinguishing it from traditional, divergent thinking tests of creativity. In light of concerns regarding the internal and external psychometric properties of creativity measures, our findings help to clarify the item and test characteristics of the RAT.
Article
Full-text available
The purpose of the present study was to use the Partial Credit Model to study the factors of the Test of Creativity in Children and identify which characteristics of the creative person would be more effective to differentiate subjects according to their ability level. A sample of 1426 students from first to eighth grades answered the instrument. The Partial Credits model was used to estimate the ability of the subjects and item difficulties on a common scale for each of the four factors, indicating which items required a higher level of creativity to be scored and will differentiate the more creative individuals. The results demonstrated that the greater part of the characteristics showed good fit indices, with values between 0.80 and 1.30 both infit and outfit, indicating a response pattern consistent with the model. The characteristics of Unusual Perspective, Expression of Emotion and Originality have been identified as better predictors of creative performance because requires greater ability level (usually above two standard deviation). These results may be used in the future development of an instrument's reduced form or simplification of the current correction model.
Article
Full-text available
The field of creativity has largely focused on individual differences in divergent thinking abilities. Recently, contemporary creativity researchers have shown that intelligence and executive functions play an important role in divergent thought, opening new lines of research to examine how higher-order cognitive mechanisms may uniquely contribute to creative thinking. The present study extends previous research on the intelligence and divergent thinking link by systematically examining the relationships among intelligence, working memory, and three fundamental creative processes: associative fluency, divergent thinking, and convergent thinking. Two hundred and sixty five participants were recruited to complete a battery of tasks that assessed a range of elementary to higher-order cognitive processes related to intelligence and creativity. Results provide evidence for an associative basis in two distinct creative processes: divergent thinking and convergent thinking. Findings also supported recent work suggesting that intelligence significantly influences creative thinking. Finally, working memory played a significant role in creative thinking processes. Recasting creativity as a construct consisting of distinct higher-order cognitive processes has important implications for future approaches to studying creativity within an individual differences framework.
Article
Full-text available
a b s t r a c t The aim of this work was to gather different perspectives on the "key ingredients" involved in creative writing by children – from experts of diverse disciplines, including teachers, linguists, psychologists, writers and art educators. Ultimately, we sought in the experts' convergence or divergence insights on the relative importance of the relevant factors that may aid writing instruction, particularly for young children. We present a study using an expert knowledge elicitation method in which representatives from five domains of expertise pertaining to writing rated 28 factors (i.e., individual skills and attributes) cov-ering six areas (general knowledge and cognition, creative cognition, conation, executive functioning, linguistic and psychomotor skills), according to their importance for creative writing. A Many-Facets Rasch Measurement (MFRM) model permitted us to quantify the relative importance of these writing factors across domain-specific expertise, while control-ling for expert severity and other systematic evaluation biases. The identified similarities and domain-specific differences in the expert views offer a new basis for understanding the conceptual gaps between the scientific literature on creative writing, the writer's self-reflection on the act of writing creatively, and educators' practices in teaching creative writing. Bridging such diverse approaches–that are, yet, relatively homogeneous within areas of expertise – appears to be useful in view of formulating process-oriented writing pedagogy that may, above all, better target the skills needed to improve children's creative writing development.
Article
Full-text available
The use of divergent thinking (DT) tests to assess creativity has been strongly criticized in recent years. Several critics have noted that DT test scores have shown little evidence of predictive validity with respect to adult creative achievement. Data from Torrance's (1972a) elementary, school longitudinal study (1958-present) were reanalyzed using structural equation modeling. Results suggest that just under half of the variance in adult creative achievement is explained by DT test scores, with the contribution of DT being more than 3 times that of intelligence quotients. However, comprehensive longitudinal models of creative achievement based on current creativity and cognitive theory have yet to be empirically validated.
Article
Full-text available
There is disagreement among researchers as to whether creativity is a unidimensional or multidimensional trait. Much of the debate centers around the most widely used measure of creativity, the Torrance Tests of Creative Thinking (TTCT). This study used data from 1,000 kindergartners (ages 5-7), 1,000 third graders (ages 7-11) and 1,000 sixth graders (ages 10-13). Confirmatory factor analyses were conducted for both the two-factor model and one-factor model to determine which fit the data better. Measurement invariance across genders and grade levels was assessed using multiple group analyses in which sets of parameters were freed sequentially in a series of hierarchically nested models. The findings indicate that the structure of TTCT scores is consistent with a two-factor theory. Also, the results of the multiple group analyses indicate that model parameters for gender groups are more invariant than for grade levels in determining the fit of the model.
Article
Full-text available
Divergent thinking (DT) tests are very often used in creativity studies. Certainly DT does not guarantee actual creative achievement, but tests of DT are reliable and reasonably valid predictors of certain performance criteria. The validity of DT is described as reasonable because validity is not an all-or-nothing attribute, but is, instead, a matter of degree. Also, validity only makes sense relative to particular criteria. The criteria strongly associated with DT are detailed in this article. It also summarizes the uses and limitations of DT, conceptually and psychometrically. After the psychometric evidence is reviewed, alternative tests and scoring procedures are described, including several that have only recently been published. Throughout this article related processes, such as problem finding and evaluative thinking, are linked to DT.
Article
Full-text available
Replies to comments by M. D. Mumford et al. (see record 2008-05954-002), J. Baer (see record 2008-05954-003), M. A. Runco (see record 2008-05954-004), K. H. Kim (see record 2008-05954-005), N. Kogan (see record 2008-05954-006), and S. Lee (see record 2008-05954-007) on the current authors' original article on divergent thinking (see record 2008-05954-001). In this reply, the authors examine the madness to their method in light of the comments. Overall, the authors agree broadly with the comments; many of the issues will be settled only by future research. The authors disagree, though, that past research has proven past scoring methods--including the Torrance methods--to be satisfactory or satisfying. The authors conclude by offering their own criticisms of their method, of divergent thinking, and of the concept of domain-general creative abilities. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Divergent thinking is central to the study of individual differences in creativity, but the traditional scoring systems (assigning points for infrequent responses and summing the points) face well-known problems. After critically reviewing past scoring methods, this article describes a new approach to assessing divergent thinking and appraises its reliability and validity. In our new Top 2 scoring method, participants complete a divergent thinking task and then circle the 2 responses that they think are their most creative responses. Raters then evaluate the responses on a 5-point scale. Regarding reliability, a generalizability analysis showed that subjective ratings of unusual-uses tasks and instances tasks yield dependable scores with only 2 or 3 raters. Regarding validity, a latent-variable study (n=226) predicted divergent thinking from the Big Five factors and their higher-order traits (Plasticity and Stability). Over half of the variance in divergent thinking could be explained by dimensions of personality. The article presents instructions for measuring divergent thinking with the new method. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
This paper argues that the Rasch model, unlike the other models generally referred to as IRT models, and those that fall into the tradition of True Score models, encompasses a set of rigorous prescriptions for what scientific measurement would be like if it were to be achieved in the social sciences. As a direct consequence, the Rasch measurement approach to the construction and monitoring of variables is sensitive to the issues raised in Messick's (1995) broader conception of construct validity. The theory / practice dialec-tic (Bond & Fox, 2001) ensures that validity is foremost in the mind of those developing measures and that genuine scientific measurement is foremost in the minds of those who seek valid outcomes from assessment. Failures of invariance, such as those referred to as DIF, should alert researchers to the need to modify assessment procedures or the subs-tantive theory under investigation, or both.
Article
Full-text available
The present study explores the factorial structure and the degree of measurement invariance of 12 divergent thinking tests. In a large sample of German students (N = 1328), a three-factor model representing verbal, figural, and numerical divergent thinking was supported. Multigroup confirmatory factor analyses revealed that partial strong measurement invariance was tenable across gender and age groups as well as school forms. Latent mean comparisons resulted in significantly higher divergent thinking skills for females and students in schools with higher mean IQ. Older students exhibited higher latent means on the verbal and figural factor, but not on the numerical factor. These results suggest that a domain-specific model of divergent thinking may be assumed, although further research is needed to elucidate the sources that negatively affect measurement invariance.
Article
Full-text available
In this paper we elaborate on the potential of the lmer function from the lme4 package in R for item response (IRT) modeling. In line with the package, an IRT framework is described based on generalized linear mixed modeling. The aspects of the framework refer to (a) the kind of covariates -- their mode (person, item, person-by-item), and their being external vs. internal to responses, and (b) the kind of effects the covariates have -- fixed vs. random, and if random, the mode across which the effects are random (persons, items). Based on this framework, three broad categories of models are described: Item covariate models, person covariate models, and person-by-item covariate models, and within each category three types of more specific models are discussed. The models in question are explained and the associated lmer code is given. Examples of models are the linear logistic test model with an error term, differential item functioning models, and local item dependency models. Because the lme4 package is for univariate generalized linear mixed models, neither the two-parameter, and three-parameter models, nor the item response models for polytomous response data, can be estimated with the lmer function.
Article
Full-text available
A formal framework for measuring change in sets of dichotomous data is developed and implications of the principle of specific objectivity of results within this framework are investigated. Building upon the concept of specific objectivity as introduced by G. Rasch, three equivalent formal definitions of that postulate are given, and it is shown that they lead to latent additivity of the parametric structure. If, in addition, the observations are assumed to be locally independent realizations of Bernoulli variables, a family of models follows necessarily which are isomorphic to a logistic model with additive parameters, determining an interval scale for latent trait measurement and a ratio scale for quantifying change. Adding the further assumption of generalizability over subsets of items from a given universe yields a logistic model which allows a multidimensional description of individual differences and a quantitative assessment of treatment effects; as a special case, a unidimensional parameterization is introduced also and a unidimensional latent trait model for change is derived. As a side result, the relationship between specific objectivity and additive conjoint measurement is clarified.
Article
The Rasch Poisson Counts model is an appropriate item response theory (IRT) model for analyzing many kinds of count data in educational and psychological testing. The evaluation of a fitted Rasch Poisson model by means of a graphical display or graphical device is difficult and, hence, very much an open problem, since the observations come from different distributions. Hence methods, potentially straightforward in the univariate case, cannot be applied for this model. However, it is possible to use a method, called the covariate–adjusted frequency plot, which incorporates covariate information into a marginal frequency plot. We utilize this idea here to construct a covariate-adjusted frequency plot for the Rasch Poisson Counts model. This graphical method is useful in illustrating goodness-of-fit of the model as well as identifying potential areas (items) with problematic fit. A case study using typical data from a frequently used intelligence test illustrates the method which is easy to use
Article
This study examined possible bicultural effects on creative potential of children in four groups of Chinese and French children in Hong Kong and Paris. An international battery of widely used divergent measures (Wallach-Kogan Creativity Tests; WKCT) and newly constructed divergent-plus-integrative measures (Evaluation of Potential Creativity; EPoC) was established for assessment. Study 1 showed that most measures of WKCT and EPoC were reasonably high in reliability and they had expected correlations with the fluency scores of some subtests of Torrance Tests of Creative Thinking. Study 2 found some interestingly mixed bicultural effects favoring verbal divergent responses for French children and graphic integrative responses for Chinese children. Compared with Paris-French children, the bicultural Hong Kong-French children had significantly higher scores in figural fluency, figural flexibility, and figural uniqueness of WKCT (requiring only verbal divergent responses) but significantly lower scores in the graphic divergent-exploratory measure of EPoC. Compared with Hong Kong-Chinese children, the bicultural Paris-Chinese children had significantly higher scores in the graphic convergent-integrative measure of EPoC, but significantly lower scores in verbal fluency, verbal flexibility, figural fluency, figural flexibility, figural uniqueness, and figural unusualness of WKCT. Implications of the mixed bicultural effects in relation to the diverse creativity measures and children groups are discussed.
Article
A Brief History of Test Theory and Design.- Formulating Test Specifications.- Modeling Test Assembly Problems.- Solving Test Assembly Problems.- Models for Assembling Single Tests.- Models for Assembling Multiple Tests.- Models for Assembling Tests with Items Sets.- Models for Assembling Tests Measuring Multiple Abilities.- Models for Adaptive Test Assembly.- Designing Item Pools for Programs with Fixed Tests.- Designing Item Pools for Programs with Adaptive Tests.- Epilogue.
Article
Optimal test-design methods are applied to rule-based item generation. Three different cases of automated test design are presented: (a) test assembly from a pool of pregenerated, calibrated items; (b) test generation on the fly from a pool of calibrated item families; and (c) test generation on the fly directly from calibrated features defining the item families. The last two cases do not assume any item calibration under a regular response theory model; instead, entire item families or critical features of them are assumed to be calibrated using a hierarchical response model developed for rule-based item generation. The test-design models maximize an expected version of the Fisher information in the test and control critical attributes of the test forms through explicit constraints. Results from a study with simulated response data highlight both the effects of within-family item-parameter variability and the severity of the constraint sets in the test-design models on their optimal solutions.
Article
There is disagreement whether creativity is a unidimensional or multidimensional trait. The dimensionality of creativity is important to understand the mind's cognitive functioning; thus aiding the devel- opment of human potential. Much of this dimensionality debate is related to the Torrance Tests of Creative Thinking (TTCT). Confirmatory factor analyses were thus conducted with data from 500 Grade-6 students, and several factor models were tested. The findings of this study show that the TTCT consists of 2 factors rather than a single factor, con- trary to the majority of research on this subject.
Article
The distributional assumption for a generalized linear model is often checked by plotting the ordered deviance residuals against the quantiles of a standard normal distribution. Such plots can be difficult to interpret, because even when the model is correct, the plot often deviates substantially from a straight line. To rectify this problem Ben and Yohai (2004) proposed plotting the deviance residuals against their theoretical quantiles, under the assumption that the model is correct. Such plots are closer to a straight line, when the model is correct, making them much more useful for model checking. However the quantile computation proposed in Ben and Yohai is, in general, relatively complicated to implement and computationally expensive, so that general purpose software for these plots is only available for the Poisson and binary cases in the R package robust. As an alternative the theoretical quantiles can efficiently and simply be estimated by repeatedly simulating new response data from the fitted model and computing the corresponding residuals. This method also provides reference bands for judging the significance of departures of QQ-plots from ideal straight line form. A second alternative is to estimate the quantiles using quantiles of the response variable distribution according to the estimated model. This latter alternative generally has lower computational cost than the first, but does not yield QQ-plot reference bands. In simulations the quantiles produced by the new methods give results indistinguishable from the original Ben and Yohai quantile computations, but the scaling of computational cost with sample size is much improved so that a 500 fold reduction in computation time was observed at sample size 50,000. Application of the methods to generalized linear models fitted to prostate cancer incidence data suggest that they are particularly useful in large dataset cases that might otherwise be incorrectly viewed as zero-inflated. The new approaches are simple enough to implement for any exponential family distribution and for several alternative types of residual, and this has been done for all the families available for use with generalized linear models in the basic distribution of R.
Article
Scale indeterminacy in analysis of differential item functioning (DIF) within the framework of item response theory can be resolved by imposing 3 anchor item methods: the equal-mean-difficulty method, the all-other anchor item method, and the constant anchor item method. In this article, applicability and limitations of these 3 methods are discussed and their performances of DIF detection are compared using Monte Carlo simulations within the family of Rasch models (Rasch, 1960). The results show that when the test contained multiple DIF items, only when the difference in the mean item difficulties between the reference and focal groups approached zero did the equal-mean-difficulty method and the all-other method function appropriately. In contrast, the constant method yielded unbiased parameter estimates, well-controlled Type I error, and high power of DIF detection, regardless of large differences in the mean item difficulties between groups and high percentages of DIF items in the tests. In addition, the more anchor items in the constant method, the higher the power of detecting DIF. Therefore, the constant anchor item method is recommended when conducting DIF analysis. Methods of locating anchor items for implementing the constant method are also discussed.
Article
The question is to what extent intelligence test-batteries prove any kind of empirical reference to common intelligence theories. Of particular interest are conceptualized tests that are of a high psychometric standard – those that fit the Rasch model – and hence are not exposed to fundamental critique. As individualized testing, i.e., a psychologist and a testee face to face, is often preferred by many practitioners, a Wechsler-like test-battery will be dealt with here: The Adaptive Intelligence Diagnosticum (AID 2; [Kubinger, K. D. &, Wurst, E. (2000). Adaptives Intelligenz Diagnostikum—Version 2.1(AID 2). [Adaptive intelligence diagnosticum 2.] Weinheim: Beltz.]). Using the standardization sample, confirmatory factor analyses were performed with respect to intelligence theories and models, respectively, as concerns Spearman, Wechsler, Thurstone, Cattell, Jäger, and Carroll. Additionally, a confirmatory factor analysis was performed with respect to a simplified neuropsychological model of specific learning disorders, which proved to fit the data best, even better than the (exploratory) four factor solution as given in the AID 2-manual. This model is based on the three interdependent factors “perception”, “retrieval”, and “utilization”. The answer is that if modern test conceptualizations attempt to fulfill pragmatic purposes they hardly have any relation to pertinent intelligence theories, but rather create their own kind of informal, heuristic model of “intelligence”.
Article
The normal quantile–quantile (Q–Q) plot of residuals is a popular diagnostic tool for ordinary linear regression with normal errors. However, for some generalized linear regression models, the distribution of deviance residuals may be very far from normality,and therefore the corresponding normal Q–Q plots may be misleading to check model adequacy. We introduce an estimate of the distribution of the deviance residuals of generalized linear models. We propose a new Q–Q plot where the observed deviance residuals are plotted against the quantiles of the estimated distribution. The method is illustrated by the analysis of real and simulated data.
Article
We consider data that can be summarized as an N � K table of counts-for example, test data obtained by administering K tests to N subjects. The cell entries yij are assumed to be conditionally independent Poisson-distributed random variables, given the NK Poisson intensity parameters ?ij. The Rasch Poisson Counts Model (RPCM) postulates that the intensity parameters are products of test difficulty and subject ability parameters. We expand the RPCM by assuming that the subject parameters are random variables having a common gamma distribution with fixed unknown parameters and that the vectors of test difficulty parameters per subject follow a common Dirichlet distribution with fixed unknown parameters. Further, we show how additional structures can be imposed on the test parameters, modeling a within-subjects design. Methods for testing the fit and estimating the parameters of these models are presented and illustrated with the analysis of two empirical data sets.
Chapter
The Rasch model (RM) has a number of especially attractive features. These assets – in particular, sufficiency of the unweighted raw score, existence of conditional maximum likelihood estimators of the model parameters and of conditional likelihood ratio tests for hypothesis testing – suggest the question as to whether they are shared by a larger class of item response (IRT) models, or whether they are, within a certain framework, unique properties of the RM. In the latter case, we would have to conclude that the RM plays a rather singular role within IRT. As we shall see, this is actually so. The derivations in this chapter lay a foundation both for the RM and for the metric scale properties of Rasch measurement.
Article
This study implements a scale purification procedure onto the standard MIMIC method for differential item functioning (DIF) detection and assesses its performance through a series of simulations. It is found that the MIMIC method with scale purification (denoted as M-SP) outperforms the standard MIMIC method (denoted as M-ST) in controlling false-positive rates and yielding higher true-positive rates. Only when the DIF pattern is balanced between groups or when there is a small percentage of DIF items in the test does M-ST perform as appropriately as M-SP. Moreover, both methods yield a higher true-positive rate under the two-parameter logistic model than under the three-parameter model. M-SP is preferable to M-ST, because DIF patterns in real tests are unlikely to be perfectly balanced and the percentages of DIF items may not be small.
Article
The use of both linear and generalized linear mixed‐effects models ( LMM s and GLMM s) has become popular not only in social and medical sciences, but also in biological sciences, especially in the field of ecology and evolution. Information criteria, such as Akaike Information Criterion ( AIC ), are usually presented as model comparison tools for mixed‐effects models. The presentation of ‘variance explained’ ( R ² ) as a relevant summarizing statistic of mixed‐effects models, however, is rare, even though R ² is routinely reported for linear models ( LM s) and also generalized linear models ( GLM s). R ² has the extremely useful property of providing an absolute value for the goodness‐of‐fit of a model, which cannot be given by the information criteria. As a summary statistic that describes the amount of variance explained, R ² can also be a quantity of biological interest. One reason for the under‐appreciation of R ² for mixed‐effects models lies in the fact that R ² can be defined in a number of ways. Furthermore, most definitions of R ² for mixed‐effects have theoretical problems (e.g. decreased or negative R ² values in larger models) and/or their use is hindered by practical difficulties (e.g. implementation). Here, we make a case for the importance of reporting R ² for mixed‐effects models. We first provide the common definitions of R ² for LM s and GLM s and discuss the key problems associated with calculating R ² for mixed‐effects models. We then recommend a general and simple method for calculating two types of R ² (marginal and conditional R ² ) for both LMM s and GLMM s, which are less susceptible to common problems. This method is illustrated by examples and can be widely employed by researchers in any fields of research, regardless of software packages used for fitting mixed‐effects models. The proposed method has the potential to facilitate the presentation of R ² for a wide range of circumstances.
Article
Count data arise in numerous fields of interest. Analysis of these data frequently require distributional assumptions. Although the graphical display of a fitted model is straightforward in the univariate scenario, this becomes more complex if covariate information needs to be included into the model. Stratification is one way to proceed, but has its limitations if the covariate has many levels or the number of covariates is large. The article suggests a marginal method which works even in the case that all possible covariate combinations are different (i.e. no covariate combination occurs more than once). For each covariate combination the fitted model value is computed and then summed over the entire data set. The technique is quite general and works with all count distributional models as well as with all forms of covariate modelling. The article provides illustrations of the method for various situations and also shows that the proposed estimator as well as the empirical count frequency are consistent with respect to the same parameter.
Article
This research monograph on the antecedents and correlates of creativity in school-aged children discusses implications of measures of intelligence versus measures of creativity and attempts an interpretation of the psychological requirements for creative products in children. Harvard Book List (edited) 1971 #624 (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
In recent decades several methods have been developed for detecting differential item functioning (DIF), and many studies have aimed to identify both the conditions under which items may or may not be adequate and the factors which affect their power and Type I error. This paper describes a Monte Carlo experiment that was carried out in order to analyse the effect of reference group sample size, focal group sample size and the interaction of the two on the power and Type I error of the Mantel–Haenszel (MH) and Logistic regression (LR) procedures. The data were generated using a three-parameter logistic model, the design was fully-crossed factorial with 12 experimental conditions arising from the crossing of the two main factors, and the dependent variables were power and the rate of false positives calculated across 100 replications. The results enabled the significant factors to be identified and the two statistics to be compared. Practical recommendations are made regarding use of the procedures by psychologists interested in the development and analysis of psychological tests. KeywordsDifferential item functioning-DIF-Mantel–Haenszel-Logistic regression-Sample size
Chapter
In large-scale educational assessments, such as the Programme for International Student Assessment (PISA) and the Trends in Mathematics and Science Study (TIMSS), a primary concern is with the estimation of the population-level characteristics of a number of latent variables and the relationships between latent variables and other variables. Typically these studies are undertaken in contexts in which there are constraints on sample size and individual student response time, yet there are high expectations with regard to the breadth of content coverage. These demands and constraints have resulted in such studies using rotated-booklet designs, with each student responding to a limited number of items on each of a number of scales. This paper describes the techniques that have been employed in such studies to enable the reliable estimation of population characteristics when there is considerable unreliability at the student level. It also discusses the methodology that is used to make the data sets produced in such studies amenable for use by data analysts undertaking secondary analyses using standard analytic tools.
Article
Many studies of creative cognition with a neuroimaging component now exist; what do they say about where and how creativity arises in the brain? We reviewed 45 brain-imaging studies of creative cognition. We found little clear evidence of overlap in their results. Nearly as many different tests were used as there were studies; this test diversity makes it impossible to interpret the different findings across studies with any confidence. Our conclusion is that creativity research would benefit from psychometrically informed revision, and the addition of neuroimaging methods designed to provide greater spatial localization of function. Without such revision in the behavioral measures and study designs, it is hard to see the benefit of imaging. We set out eight suggestions in a manifesto for taking creativity research forward.
Article
As a multivariate model of the number of events, Rasch's multiplicative Poisson model is extended such that the parameters for individuals in the prior gamma distribution have continuous covariates. The parameters for individuals are integrated out and the hyperparameters in the prior distribution are estimated by a numerical method separately from difficulty parameters that are treated as fixed parameters or random variables. In addition, a method is presented for estimating parameters in Rasch's model with missing values.
Article
Consideration will be given to a model developed by Rasch that assumes scores observed on some types of attainment tests can be regarded as realizations of a Poisson process. The parameter of the Poisson distribution is assumed to be a product of two other parameters, one pertaining to the ability of the subject and a second pertaining to the difficulty of the test. Rasch's model is expanded by assuming a prior distribution, with fixed but unknown parameters, for the subject parameters. The test parameters are considered fixed. Secondly, it will be shown how additional between- and within-subjects factors can be incorporated. Methods for testing the fit and estimating the parameters of the model will be discussed, and illustrated by empirical examples.
Article
A unidimensional latent trait model for responses scored in two or more ordered categories is developed. This “Partial Credit” model is a member of the family of latent trait models which share the property of parameter separability and so permit “specifically objective” comparisons of persons and items. The model can be viewed as an extension of Andrich's Rating Scale model to situations in which ordered response alternatives are free to vary in number and structure from item to item. The difference between the parameters in this model and the “category boundaries” in Samejima's Graded Response model is demonstrated. An unconditional maximum likelihood procedure for estimating the model parameters is developed.