Article

A snapshot of g ? Binary and polytomous Item-Response Theory investigations of the Last Series of the Standard Progressive Matrices (SPM-LS)

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Supplementary resource (1)

... Over the decades, they have been refined and adapted, yielding several versions to suit a range of age groups and cognitive abilities (Bors and Stokes 1998;Langener et al. 2022;Myszkowski and Storme 2018;Raven 2000). Their enduring relevance is reflected in the vast array of studies that have employed these matrices to explore the dimensions of human intelligence (Kaufman et al. 2009), as well as in their continued use in educational, occupational, and clinical settings. ...
... In addition to their foundational role in intelligence research, Raven's Progressive Matrices serve a practical purpose in the broader psychological assessment landscape. The original matrices and their revisions and shortened versions are frequently employed as a succinct measure of general intelligence (Myszkowski and Storme 2018), offering a less time-consuming alternative to more extensive test batteries. Although sacrificing depth of insight into an individual's cognitive capabilities (Gignac 2015), this efficiency makes them particularly valuable in large-scale studies or in contexts where testing time is at a premium. ...
... Although CTT methods-congeneric models aside-present the advantage of permitting the calculation of person scores (via sum/average scoring as proxies for "true scores") without requiring the estimation of a psychometric model, this added simplicity often oversimplifies the functional relation between latent attributes and item responses (Borsboom 2006;McNeish and Wolf 2020), especially in the context of binary responses. Still, in the context of progressive matrices, sum/average scoring can provide reasonably good proxies for more accurate (but also more complex) scoring methods based on latent variable models (Myszkowski and Storme 2018). ...
Article
Full-text available
Measurement models traditionally make the assumption that item responses are independent from one another, conditional upon the common factor. They typically explore for violations of this assumption using various methods, but rarely do they account for the possibility that an item predicts the next. Extending the development of auto-regressive models in the context of personality and judgment tests, we propose to extend binary item response models—using, as an example, the 2-parameter logistic (2PL) model—to include auto-regressive sequential dependencies. We motivate such models and illustrate them in the context of a publicly available progressive matrices dataset. We find an auto-regressive lag-1 2PL model to outperform a traditional 2PL model in fit as well as to provide more conservative discrimination parameters and standard errors. We conclude that sequential effects are likely overlooked in the context of cognitive ability testing in general and progressive matrices tests in particular. We discuss extensions, notably models with multiple lag effects and variable lag effects.
... Processing speed will be measured with Space-Code (Myszkowski et al., 2018), a computer game where participants are presented with a number at the bottom of the screen and must select the corresponding number in a central 3x3 grid. If done correctly, an enemy spaceship is destroyed. ...
... No prior studies reported the standard deviation (SD) for RSPM in our study population; therefore, we estimated a SD of 4.46 by assessing the percentile differences from a published age-stratified cohort (Assessment-Training.com, 2021) (Appendix 1). The minimum detectable difference was determined to be a change from the 50th percentile to the 75th (change in RSPM score of 3) (Myszkowski et al., 2018). To attain an alpha of 5% and 90% power, as well offset an expected drop-out rate of 17% based on prior studies using supplements in adolescents (Myszkowski et al., 2018), our sample size was calculated to be 116 participants, 58 per group. ...
... The minimum detectable difference was determined to be a change from the 50th percentile to the 75th (change in RSPM score of 3) (Myszkowski et al., 2018). To attain an alpha of 5% and 90% power, as well offset an expected drop-out rate of 17% based on prior studies using supplements in adolescents (Myszkowski et al., 2018), our sample size was calculated to be 116 participants, 58 per group. Given the number of students in an average middle-income area high school, one high school is expected to have sufficient students below the 50th percentile. ...
Article
Full-text available
Background/Aims: Creatine supplementation has demonstrated cognitive benefits in neurodegenerative conditions, having a protective effect in the brain's function in stressful situations, with excellent safety (Watanabe et al., 2002). However, any beneficial effects on the cognitive performance of healthy adolescents underperforming in school is unknown. Our objective is to assess whether creatine supplementation improves cognitive performance in 15- to 17-year-old students with an average school grade below the 50th percentile. Methods: This will be a phase-II, triple-blinded, randomized, parallel-group, superiority, single-center trial. Students with grades below the 50th percentile in the prior semester will be enrolled and randomized to receive juice packages containing either creatine monohydrate supplementation (0.1 mg/kg/day), or placebo, for 12 weeks. The primary outcome will be the mean difference in change of Raven’s Standard Progressive Matrices (RSPM) scores from baseline to week 12 between groups. To achieve a 90% power for detecting a 3-point difference in change in the RSPM score, and accounting for drop-out, 116 participants will be included. Secondary outcomes will include the difference in processing speed (SpaceCode), working memory (SpaceMatrix), non-visual memory (backward digit span), percentage change in lean mass, and any safety events. Conclusion: To our knowledge, this will be the most comprehensive study assessing creatine supple-mentation in adolescents. This is a low-risk intervention that has been shown to improve cognitive function in other populations. This study will potentially support the widespread use of creatine supplementation in adolescents with low school performance, while having a positive impact on this population.
... There has been recent interest in assessing the usefulness of short versions of the Raven's Progressive Matrices. Myszkowski and Storme (2018) composed the last 12 matrices of the Standard Progressive Matrices (SPM-LS) and argued that it could be regarded as a valid indicator of general intelligence g. As part of this special issue, the SPM-LS dataset that was analyzed in Myszkowski and Storme (2018) was reanalyzed in a series of papers applying a wide range of psychometric approaches. ...
... Myszkowski and Storme (2018) composed the last 12 matrices of the Standard Progressive Matrices (SPM-LS) and argued that it could be regarded as a valid indicator of general intelligence g. As part of this special issue, the SPM-LS dataset that was analyzed in Myszkowski and Storme (2018) was reanalyzed in a series of papers applying a wide range of psychometric approaches. ...
... In the following, we propose an extension of RLCM for polytomous item responses. It has been shown that using information from item distractors (Myszkowski and Storme 2018;Storme et al. 2019) could increase the reliability for person ability estimates compared to using only dichotomous item responses that only distinguishes between correct and incorrect item responses. Moreover, it could be beneficial to learn about the differential behavior of item distractors analyzing the data based on correct and all incorrect item responses. ...
Article
Full-text available
The last series of Raven’s standard progressive matrices (SPM-LS) test was studied with respect to its psychometric properties in a series of recent papers. In this paper, the SPM-LS dataset is analyzed with regularized latent class models (RLCMs). For dichotomous item response data, an alternative estimation approach based on fused regularization for RLCMs is proposed. For polytomous item responses, different alternative fused regularization penalties are presented. The usefulness of the proposed methods is demonstrated in a simulated data illustration and for the SPM-LS dataset. For the SPM-LS dataset, it turned out the regularized latent class model resulted in five partially ordered latent classes. In total, three out of five latent classes are ordered for all items. For the remaining two classes, violations for two and three items were found, respectively, which can be interpreted as a kind of latent differential item functioning.
... There has been recent interest in assessing the usefulness of short versions of the Raven's Progressive Matrices. [1] composed the last twelve matrices of the Standard Progressive Matrices (SPM-LS) and argued that it could be regarded as valid indicator of general intelligence g. As part of this special issue, the SPM-LS dataset that was analyzed in [1] was reanalyzed in a series of papers applying a wide range of psychometric approaches. ...
... [1] composed the last twelve matrices of the Standard Progressive Matrices (SPM-LS) and argued that it could be regarded as valid indicator of general intelligence g. As part of this special issue, the SPM-LS dataset that was analyzed in [1] was reanalyzed in a series of papers applying a wide range of psychometric approaches. In particular, [2] investigated item distractor analysis with a particular focus on reliability, [3] provided additional insights due to dimensionality analysis, [4] applied the Haberman interaction model using the R package dexter, Mokken scaling was employed by [5], and, finally, [6] presented Bayesian item response modeling using the R package brms. ...
... In the following, we propose an extension of RLCM for polytomous item responses. It has been shown that using information from item distractors [1,51] could increase the reliability compared to using only dichotomous item responses that only distinguishes between correct and incorrect item responses. Assume that 0 denotes the category that refers to a correct response and 1, . . . ...
Preprint
The last series of Raven's standard progressive matrices (SPM-LS) test were studied with respect to its psychometric properties in a series of recent papers. In this paper, the SPM-LS dataset is analyzed with regularized latent class models (RLCM). For dichotomous item response data, an alternative estimation approach for RLCMs is proposed. For polytomous item responses, different alternatives for performing regularized latent class analysis are proposed. The usefulness of the proposed methods is demonstrated in a simulated data illustration and for the SPM-LS dataset. For the SPM-LS dataset, it turned out the regularized latent class model resulted in five partially ordered latent classes.
... That fact is taken into account in both traditional and contemporary distractor analysis (Gierl et al. 2017). An approach that falls in the category of contemporary distractor analysis is Myszkowski and Storme (2018) nested logit model application to the latest short form of Raven's Progressive Matrices. The nested logit model family (Suh and Bolt 2010) concurrently uses accuracy and distractor choice information from each item to improve ability estimation. ...
... Then, given the item has not been solved, distractor choices are modeled by Bock's nominal response model (NRM) (Bock 1972). Hence, nested logit models, as used by Myszkowski and Storme (2018), include varying discrimination parameters for each distractor. Traditional distractor analysis, as part of a thorough item analysis, does not necessarily focus on this aspect of distractor choices. ...
... In addition, analogous indicators for usage of distractor elimination strategies (e.g., the proportion of overall time spent on the response alternatives or back and forth eye movements between the item content and the response alternatives) were found to be negatively correlated with test performance (Bethell-Fox et al. 1984;Schiano et al. 1989;Vigneau et al. 2006;Hayes et al. 2011;Jarosz and Wiley 2012;Arendasy and Sommer 2013;Gonthier and Thomassin 2015;Gonthier and Roulin 2019). In line with these findings, (Myszkowski and Storme 2018) pointed out that nested logit models which take into account solution behavior, as well as distractor choice, are perhaps best suited to model solution processes starting with constructive matching, and given that a solution, cannot be reached, shifting towards distractor elimination strategies at a later stage. Indeed, Gonthier and Roulin (2019) reported results in line with the idea that both constructive matching and response elimination might be used on the same item. ...
Article
Full-text available
Distractors might display discriminatory power with respect to the construct of interest (e.g., intelligence), which was shown in recent applications of nested logit models to the short-form of Raven's progressive matrices and other reasoning tests. In this vein, a simulation study was carried out to examine two effect size measures (i.e., a variant of Cohen's ω and the canonical correlation R CC) for their potential to detect distractors with ability-related discriminatory power. The simulation design was adopted to item selection scenarios relying on rather small sample sizes (e.g., N = 100 or N = 200). Both suggested effect size measures (Cohen's ω only when based on two ability groups) yielded acceptable to conservative type-I-error rates, whereas, the canonical correlation outperformed Cohen's ω in terms of empirical power. The simulation results further suggest that an effect size threshold of 0.30 is more appropriate as compared to more lenient (0.10) or stricter thresholds (0.50). The suggested item-analysis procedure is illustrated with an analysis of twelve Raven's progressive matrices items in a sample of N = 499 participants. Finally, strategies for item selection for cognitive ability tests with the goal of scaling by means of nested logit models are discussed.
... Raven's Standard Progressive Matrices (SPM) test and related matrix-based tests are widely applied measures of cognitive ability. Using Bayesian Item Response Theory (IRT) models, I reanalyzed data of an SPM short form proposed by Myszkowski and Storme (2018) and, at the same time, illustrate the application of these models. Results indicate that a three-parameter logistic (3PL) model is sufficient to describe participants dichotomous responses (correct vs. incorrect) while persons' ability parameters are quite robust across IRT models of varying complexity. ...
... Results indicate that a three-parameter logistic (3PL) model is sufficient to describe participants dichotomous responses (correct vs. incorrect) while persons' ability parameters are quite robust across IRT models of varying complexity. These conclusions are in line with the original results of Myszkowski and Storme (2018). Using Bayesian as opposed to frequentist IRT models offered advantages in the estimation of more complex (i.e., 3-4PL) IRT models and provided more sensible and robust uncertainty estimates. ...
... Raven's Standard Progressive Matrices (SPM) test ( [1]) matrix-based tests are widely applied measures of cognitive ability (e.g., [2,3]). Due to their non-verbal content, which reduces biases due to language and cultural differences, they are considered one of the purest measures of fluid intelligence ( [4]). However, a disadvantage of the original SPM is that its administration takes considerable time as 60 items have to be answered and time limits are either very loose or not imposed at all (e.g., [3]). ...
Article
Full-text available
Raven’s Standard Progressive Matrices (SPM) test and related matrix-based tests are widely applied measures of cognitive ability. Using Bayesian Item Response Theory (IRT) models, I reanalyzed data of an SPM short form proposed by Myszkowski and Storme (2018) and, at the same time, illustrate the application of these models. Results indicate that a three-parameter logistic (3PL) model is sufficient to describe participants dichotomous responses (correct vs. incorrect) while persons’ ability parameters are quite robust across IRT models of varying complexity. These conclusions are in line with the original results of Myszkowski and Storme (2018). Using Bayesian as opposed to frequentist IRT models offered advantages in the estimation of more complex (i.e., 3–4PL) IRT models and provided more sensible and robust uncertainty estimates.
... Assessing job applicants' general mental ability online poses psychometric challenges due to the necessity of having brief but accurate tests. Recent research (Myszkowski & Storme, 2018) suggests that recovering distractor information through Nested Logit Models (NLM; Suh & Bolt, 2010) increases the reliability of ability estimates in reasoning matrix-type tests. In the present research, we extended this result to a different context (online intelligence testing for recruitment) and in a larger sample (N = 2949 job applicants). ...
... This approach is based on the premise that when a test taker selects a wrong response option out of a set of wrong response options, the choice of the wrong response option can carry information about the ability of the test taker. Further, recent developments [8] applied to progressive matrices have suggested recovering additional information from distractor responses through Nested Logit Models (NLM) [9], and have indicated that such models may be more appropriate than Bock's [7] Nominal Response Model in logical reasoning tests, but also than traditional binary IRT models [8]. In this research, recovering information from the choice of distractors has provided significant gains in reliability in comparison with not recovering such information and using traditional binary logistic models. ...
... This approach is based on the premise that when a test taker selects a wrong response option out of a set of wrong response options, the choice of the wrong response option can carry information about the ability of the test taker. Further, recent developments [8] applied to progressive matrices have suggested recovering additional information from distractor responses through Nested Logit Models (NLM) [9], and have indicated that such models may be more appropriate than Bock's [7] Nominal Response Model in logical reasoning tests, but also than traditional binary IRT models [8]. In this research, recovering information from the choice of distractors has provided significant gains in reliability in comparison with not recovering such information and using traditional binary logistic models. ...
Article
Full-text available
Assessing job applicants' general mental ability online poses psychometric challenges due to the necessity of having brief but accurate tests. Recent research (Myszkowski & Storme, 2018) suggests that recovering distractor information through Nested Logit Models (NLM; Suh & Bolt, 2010) increases the reliability of ability estimates in reasoning matrix-type tests. In the present research, we extended this result to a different context (online intelligence testing for recruitment) and in a larger sample (N = 2949 job applicants). We found that the NLMs outperformed the Nominal Response Model (Bock, 1970) and provided significant reliability gains compared with their binary logistic counterparts. In line with previous research, the gain in reliability was especially obtained at low ability levels. Implications and practical recommendations are discussed.
... Due to its considerable length (60 items), there has been a growing interest in developing short versions of this test. Unfortunately, the available short versions-such as the Advanced Progressive Matrices tests (i.e., APM)-present substantial shortcomings [2]. Consequently, [2] proposed the SPM-LS, a new short version of the SPM test based on its last, most-difficult 12 matrices of this test. ...
... Unfortunately, the available short versions-such as the Advanced Progressive Matrices tests (i.e., APM)-present substantial shortcomings [2]. Consequently, [2] proposed the SPM-LS, a new short version of the SPM test based on its last, most-difficult 12 matrices of this test. These items consist of non-verbal stimuli where each item presents a single correct answer and seven distractors. ...
... Additionally, a threeparameter nested logistic model was applied to recover relevant information from responses to the different distractors. Remarkably, the original authors concluded that the SPM-LS was a superior alternative to the APM test ( [2]; p.113), and encouraged other researchers to re-analyse this dataset by making it publicly available and by opening a call for papers on the matter in the Journal of Intelligence. ...
Article
Full-text available
There has been increased interest in assessing the quality and usefulness of short versions of the Raven’s Progressive Matrices. A recent proposal, composed of the last twelve matrices of the Standard Progressive Matrices (SPM-LS), has been depicted as a valid measure of g. Nonetheless, the results provided in the initial validation questioned the assumption of essential unidimensionality for SPM-LS scores. We tested this hypothesis through two different statistical techniques. Firstly, we applied exploratory graph analysis to assess SPM-LS dimensionality. Secondly, exploratory bi-factor modelling was employed to understand the extent that potential specific factors represent significant sources of variance after a general factor has been considered. Results evidenced that if modelled appropriately, SPM-LS scores are essentially unidimensional, and that constitute a reliable measure of g. However, an additional specific factor was systematically identified for the last six items of the test. The implications of such findings for future work on the SPM-LS are discussed.
... Thus, we expect our approach to be conservative (i.e., confidence intervals are wider as compared to an approach that takes the dependence into account) and any inferences based on the intervals should be treated with caution. However, approaches that explicitly model the potential dependence here such as case resampling combined with model refitting and fully non-parametric bootstrapping (Myszkowski & Storme, 2018;Storme et al., 2019) would be computationally too demanding for the CMPCM which involves approximation of an infinite series. This assertion was tested on a Dell Precision 3551 laptop with the Windows 11 operating system and an x86-64 processor. ...
... Furthermore, we focused on reliability as a main outcome in this work and for the quantification of uncertainty in reliability estimates we had to rely on a rather pragmatic-yet conservative-bootstrap approach. This approach did not take into account the potential dependence between estimates of SE 2 and ̂ 2 , because approaches such as non-parametric bootstrap combined with a case resampling procedure (Myszkowski & Storme, 2018;Storme et al., 2019) cannot easily be implemented for the CMPCM. The CMPCM involves computation of an infinite sum and refitting the models for only 1000 bootstrap samples are expected to take weeks even on a high-performance computer cluster. ...
Article
Full-text available
Are latent variables of researcher performance capacity merely elaborate proxies of productivity? To investigate this research question, we propose extensions of recently used item-response theory models for the estimation of researcher performance capacity. We argue that productivity should be considered as a potential explanatory variable of reliable individual differences between researchers. Thus, we extend the Conway-Maxwell Poisson counts model and a negative binomial counts model by incorporating productivity as a person-covariate. We estimated six different models: a model without productivity as item and person-covariate, a model with raw productivity as person-covariate, a model with log-productivity as person covariate, a model that treats log-productivity as a known offset, a model with item-specific influences of productivity, and a model with item-specific influences of productivity as well as academic age as person-covariate. We found that the model with item-specific influences of productivity fitted two samples of social science researchers best. In the first dataset, reliable individual differences decreased substantially from excellent reliability when productivity is not modeled at all to inacceptable levels of reliability when productivity is controlled as a person-covariate, while in the second dataset reliability decreased only negligibly. This all emphasizes the critical role of productivity in researcher performance capacity estimation.
... Our results indicate that the short version serves as valid alternative to the original version for adolescents as evidenced by the relation between the two versions. This study adds in two ways to previous studies investigating validity of short versions of the (advanced) RPM (e.g., Arthur et al. 1999;Myszkowski and Storme 2018). First, participants in our study completed both the short and original version, which allowed us to test the relation between performance on the two versions. ...
... Second, our adolescent sample comes from average educational backgrounds. This is important, because, as noted by others, the RSPM is less appropriate for individuals with high expected levels of intelligence, as evidenced by the regularly observed ceiling-effects (in adults) (e.g., Myszkowski and Storme 2018). In our sample, only three percent of participants achieved a perfect score on the short version, supporting the notion that the short RSPM serves as valid alternative to the original version for the average adolescent. ...
Article
Full-text available
Cognitive ability of adolescents is often measured using the Raven’s Standard Progressive Matrices (RSPM). However, the RSPM knows a long administration time which may be suboptimal, as time-on-task effects are known to increase fatigue, to lower motivation, and to worsen performance on cognitive tasks. Therefore, a shortened version for adolescents was developed recently. In the current preregistered study we investigated this shortened version in a sample of adolescents (N = 99) of average educational backgrounds. We tested whether the shortened RSPM is a valid alternative to the original RSPM, which proved to be the case, as we observed a moderate to high correlation between the two versions. Moreover, we tested version effects on fatigue, motivation and performance. Fatigue was lower and motivation was higher after completing the short compared to the original version, and performance was better in the short compared to the original version. However, additional analyses suggested that beneficial version effects on performance were not due to reduced time-on-task, but due to the short version containing less difficult items than the original version. Moreover, version related differences in performance were not related to version related differences in fatigue and motivation. We conclude that the shortened version of the RSPM is a valid alternative to the original version, and that the shortened version is beneficial in terms of fatigue and motivation, but that these beneficial effects on fatigue and motivation do not carry over to performance.
... Estas medidas presentan mayor saturación del factor g (Gignac, 2015;Myszkowski & Storme, 2018), por lo cual se desarrollaron versiones breves que faciliten su aplicación en menos tiempo y reporten con mayor rigor los elementos más representativos de la unidimensionalidad del factor general (Myszkowski & Storme, 2018), a partir de diferentes modelos estadísticos instrumentales robustos (Flores-Mendoza et al., 2018;García-Garzón, Abad & Garrido, 2019;Myszkowski, 2020;Partchev, 2020) brindando mayor evidencia a la medición del factor g. ...
... Estas medidas presentan mayor saturación del factor g (Gignac, 2015;Myszkowski & Storme, 2018), por lo cual se desarrollaron versiones breves que faciliten su aplicación en menos tiempo y reporten con mayor rigor los elementos más representativos de la unidimensionalidad del factor general (Myszkowski & Storme, 2018), a partir de diferentes modelos estadísticos instrumentales robustos (Flores-Mendoza et al., 2018;García-Garzón, Abad & Garrido, 2019;Myszkowski, 2020;Partchev, 2020) brindando mayor evidencia a la medición del factor g. ...
Article
Full-text available
Facultad de Psicología UNMSM © Los autores. Este artículo es publicado por la Revista de Investigación en Psicología de la Facultad de Psicología, Universidad Nacional Mayor de San Marcos. Este es un artículo de acceso abierto, distribuido bajo los términos de la licencia Creative Commons Atribución 4.0 Internacional (CC BY 4.0) [https://creativecommons.org/licenses/by/4.0/deed.es] que permite el uso, distribución y reproducción en cualquier medio, siempre que la obra original sea debidamente citada de su fuente original.
... Myszkowski and Storme Myszkowski and Storme (2018) have applied a number of binary and polytomous item-response theory (IRT) Lord (1980) models to the last series of Raven's Standard Progressive Matrices (SPM) test Raven (1941), further referred to as the SPM-LS test. They have made their dataset publicly available, and the Journal of Intelligence has proposed a special issue where other researchers are encouraged to present their own analyses. ...
... The data is as supplied with the original study by Myszkowski and Storme Myszkowski and Storme (2018): The responses of 499 French undergraduate students aged between 19 and 24 to the twelve items of SPM-LS. ...
Article
Full-text available
We analyze a 12-item version of Raven’s Standard Progressive Matrices test, traditionally scored with the sum score. We discuss some important differences between assessment in practice and psychometric modelling. We demonstrate some advanced diagnostic tools in the freely available R package, dexter. We find that the first item in the test functions badly—at a guess, because the subjects were not given exercise items before the live test.
... In Figure 3, information and reliability functions are presented but with estimates of information and reliability that are marginalized across the entire set of four judges. IRT-based marginal reliability estimates are often used rules of thumb for acceptability that are similar to CTT-based estimates (e.g., Myszkowski & Storme, 2018)-even though decisions on acceptability should remain context dependent. ...
... Of course, both may be reported (e.g., Myszkowski & Storme, 2017, 2018. In addition, bootstrapping strategies may be used to draw inference on these reliability estimates (e.g., Myszkowski & Storme, 2018). JRT allows one to explore dimensionality and test structural validity. ...
Article
Full-text available
The Consensual Assessment Technique (CAT)-more generally, using product creativity judgments-is a central and actively debated method to assess product and individual creativity. Despite a constant interest in strategies to improve its robustness, we argue that most psychometric investigations and scoring strategies for CAT data remain constrained by a flawed psychometrical framework. We first describe how our traditional statistical account of multiple judgments, which largely revolves around Cronbach's and sum/average scores, poses conceptual and practical problems-such as misestimating the construct of interest, misestimating reliability and structural validity, underusing latent variable models, and reducing judge characteristics as a source of error-that are largely imputable to the influence of classical test theory. Then, we propose that the item-response theory framework, traditionally used for multi-item situations, be transposed to multiple-judge CAT situations in Judge Response Theory (JRT). After defining JRT, we present its multiple advantages, such as accounting for differences in individual judgment as a psychological process-rather than as random error-giving a more accurate account of the reliability and structural validity of CAT data and allowing the selection of complementary not redundant-judges. The comparison of models and their availability in statistical packages are notably discussed as further directions.
... Dataset 4 contains N = 499 subjects and I = 12 items, and it stems from the last series of the standard progressive matrices (SPM-LS; [55,56]). It has been analyzed in numerous publications (e.g., [70][71][72]). ...
Article
Full-text available
The two-parameter logistic (2PL) item response model is typically estimated using an unbounded distribution for the trait θ. In this article, alternative specifications of the 2PL models are investigated that consider a bounded or a positively valued θ distribution. It is highlighted that these 2PL specifications correspond to the partial membership mastery model and the Ramsay quotient model, respectively. A simulation study revealed that model selection regarding alternative ranges of the θ distribution can be successfully applied. Different 2PL specifications were additionally compared for six publicly available datasets.
... Much like an invitation to revisit a story with various styles or with various points of view, this Special Issue was opened to contributions that offered extensions or reanalyses of a single-and somewhat simple-dataset, which had been recently published. The dataset was from a recent paper (Myszkowski and Storme 2018), and contained responses from 499 adults to a non-verbal logical reasoning multiple-choice test, the SPM-LS, which consists of the Last Series of Raven's Standard Progressive Matrices (Raven 1941). The SPM-LS is further discussed in the original paper (as well as through the investigations presented in this Special Issue), and most researchers in the field are likely familiar with the Standard Progressive Matrices. ...
Article
Full-text available
It is perhaps popular belief-at least among non-psychometricians-that there is a unique or standard way to investigate the psychometric qualities of tests. If anything, the present Special Issue demonstrates that it is not the case. On the contrary, this Special Issue on the "analysis of an intelligence dataset" is, in my opinion, a window to the present vividness of the field of psychometrics. Much like an invitation to revisit a story with various styles or with various points of view, this Special Issue was opened to contributions that offered extensions or reanalyses of a single-and somewhat simple-dataset, which had been recently published. The dataset was from a recent paper (Myszkowski and Storme 2018), and contained responses from 499 adults to a non-verbal logical reasoning multiple-choice test, the SPM-LS, which consists of the Last Series of Raven's Standard Progressive Matrices (Raven 1941). The SPM-LS is further discussed in the original paper (as well as through the investigations presented in this Special Issue), and most researchers in the field are likely familiar with the Standard Progressive Matrices. The SPM-LS is simply a proposition to use the last series of the test as a standalone test. A minimal description of the SPM-LS would probably characterize it as a theoretically unidimensional measure-in the sense that one ability is tentatively measured-comprised of 12 pass-fail non-verbal items of (tentatively) increasing difficulty. Here, I refer to the pass-fail responses as the binary responses, and the full responses (including which distractor was selected) as the polytomous responses. In the original paper, a number of analyses had been used, including exploratory factor analysis with parallel analysis, confirmatory factor analyses using a structural equation modeling framework, binary logistic item response theory models (1-, 2-, 3-and 4-parameter models), and polytomous (unordered) item response theory models, including the nominal response model (Bock 1972) and nested logit models (Suh and Bolt 2010). In spite of how extensive the original analysis may have seemed, the contributions of this Special Issue present several extensions to our analyses. I will now briefly introduce the different contributions of the Special Issue, in chronoligical order of publication. In their paper, Garcia-Garzon et al. (2019) propose an extensive reanalysis of the dimensionality of the SPM-LS, using a large variety of techniques, including bifactor models and exploratory graph analysis. Storme et al. (2019) later find that the reliability boosting strategy proposed in the original paper-which consisted of using nested logit models (Suh and Bolt 2010) to recover information from distractor information-is useful in other contexts, by using the example on a logical reasoning test applied in a personnel selection context. Moreover, Bürkner (2020) later presents how to use his R Bayesian multilevel modeling package brms (Bürkner 2017) in order to estimate various binary item response theory models, and compares the results with the frequentist approach used in the original paper with the item response theory package mirt (Chalmers 2012). Furthermore, Forthmann et al. (2020) later proposed a new procedure that can be used to detect (or select) items that could present discriminating distractors (i.e., items for which distractor responses could be used to extract additional information). In addition, Partchev (2020) then discusses issues that relate to the use of distractor information to extract information on ability in multiple choice tests, in particular in the context of cognitive assessment, and presents how to use the R package dexter (Maris et al. 2020) to study the binary responses and distractors of the SPM-LS.
... Finally, we computed McDonald's omega hierarchical (x h ) for each rating session. McDonald's x h (McDonald, 1999) is a unidimensionality index, which estimates the proportion of variance in ratings accounted for by a general factor (Myszkowski & Storme, 2018;Zinbarg, Yovel, Revelle, & McDonald, 2006). It has been recommended as an alternative to Cronbach's coefficient alpha, which cannot provide 3 As single measure ICCs were low for the original rater samples (and remained low for subsamples), Table 2 only includes average measure ICCs. 5 information about a measurement instrument's homogeneity (Myszkowski & Storme, 2017). ...
Article
Full-text available
The consensual assessment technique (CAT) is a reliable and valid method to measure (product) creativity and often considered the gold standard of creativity assessment. The reliability measure traditionally applied in CAT studies—inter‐rater reliability—cannot capture time‐sampling error, which is a particular relevant source of error for specific applications of the CAT. Therefore, the present study intended to investigate the test–retest reliability of CAT ratings. We asked raters (N = 61) for their creativity assessment of the same set of 90 fashion outfits at an initial rating session and a follow‐up session either 2 or 4 weeks later. We found that mean product ratings—the actual focus of interest in the CAT—were highly stable over time, as evidenced by consistency and agreement ICCs clearly exceeding levels of .90. However, individual raters (partially) lacked temporal stability, indicating a drift in rater tendencies over time. Our findings support the CAT’s reputation as a highly reliable measurement method, but question the temporal rating stability of the CAT’s actual “measurement instrument,” namely individual judges.
... Marginal reliability (which for clarity we call "expected reliability" in "jrt") is similarly computed, but instead of using averaging across sample estimates, it is based on integration using a prior distribution of person estimates (Green, Bock, Humphreys, Linn, & Reckase, 1984) -here a standard normal distribution. We anticipate that alternate computations of IRT-based reliability (e.g., Brown & Maydeu-Olivares, 2011;Culpepper, 2013;Raju et al., 2007), as well as confidence interval computations (e.g., Myszkowski & Storme, 2018) will be added in the future. ...
Article
Full-text available
Although the Consensual Assessment Technique (CAT; Amabile, 1982) is considered a gold standard in the measurement of product attributes-including creativity (Baer & McKool, 2009)-considerations on how to improve its scoring and psychometric modeling are rare. Recently, it was advanced (Myszkowski & Storme, 2019) that the framework of Item-Response Theory (IRT) is appropriate for CAT data, and would provide several practical and conceptual benefits to both the psychometric investigation of the CAT and the scoring of creativity. However, the packages recommended for IRT modeling of ordinal data are hardly accessible for researchers non-familiar with IRT, or offer minimal possibility for adaptation of outputs to judgment data. Thus, the package "jrt" was developed for the open source programming language R, and available on the Comprehensive R-Archive Network (CRAN). Its main aim is to make IRT analyses easily applicable to CAT data, by automating model selections, by diagnosing and dealing with issues related to model-data incompatibilities, by providing quick, customizable and publication-ready outputs for communication, and by guiding researchers new to IRT as to the different methods available. We provide brief tutorials and examples for the main functions-which are further detailed in the online vignette and documentation on CRAN. We finally discuss the current limitations and anticipated extensions of the "jrt" package, and invite researchers to take advantage of its practicality.
Article
In a large (N = 300), pre-registered experiment and data analysis model, we find that individual variation in overall performance on Raven’s Progressive Matrices is substantially driven by differential strategizing in the face of difficulty. Some participants choose to spend more time on hard problems while others choose to spend less and these differences explain about 42% of the variance in overall performance. In a data analysis jointly predicting participants’ reaction times and accuracy on each item, we find that the Raven’s task captures at most half of participants’ variation in time-controlled ability (48%) down to almost none (3%), depending on which notion of ability is assumed. Our results highlight the role that confounding factors such as motivation play in explaining individuals’ differential performance in IQ testing.
Article
Item-response theory (IRT) represents a key advance in measurement theory. Yet, it is largely absent from curricula, textbooks and popular statistical software, and often introduced through a subset of models. This Element, intended for creativity and innovation researchers, researchers-in-training, and anyone interested in how individual creativity might be measured, aims to provide 1) an overview of classical test theory (CTT) and its shortcomings in creativity measurement situations (e.g., fluency scores, consensual assessment technique, etc.); 2) an introduction to IRT and its core concepts, using a broad view of IRT that notably sees CTT models as particular cases of IRT; 3) a practical strategic approach to IRT modeling; 4) example applications of this strategy from creativity research and the associated advantages; and 5) ideas for future work that could advance how IRT could better benefit creativity research, as well as connections with other popular frameworks.
Article
Full-text available
Validation is an ongoing process for any scale, and thus also for the Vienna Art Interest Art Knowledge Scale (VAIAK). In this paper, we add to this process by assessing the validity of the VAIAK by using an item-response theory (IRT) approach combined with a qualitative approach to further understand the underlying process as to how participants answer. Our results show that both the art interest and the art knowledge scale can capture a range of ability in their respective domains and that the individual items have adequate discriminability and a range of difficulties. In addition, as expected, experts consistently showed higher levels of ability (in both interest and knowledge) and tend to find the items easier. Some items, however, exhibited differential item functioning, the implications of which (especially for the scoring of the scale) are discussed in detail. In combination with the qualitative analyses, one item (B6) was identified as an item that should be altered for which we propose the new “VAIAK-R.” Furthermore, the qualitative analysis for the open items indicated that participants seem to use whatever knowledge is available to them to try to identify the correct answer, whether this is done by simply naming an artist or style that they know in either a relatively random or an educated guess strategy. Overall, our findings provide further insight into the working of the scale and indicate that the VAIAK has good psychometric properties and is valid as long as it is used in the intended way.
Article
Restricted latent class models (RLCMs) provide an important framework for supporting diagnostic research in education and psychology. Recent research proposed fully exploratory methods for inferring the latent structure. However, prior research is limited by the use of restrictive monotonicity condition or prior formulations that are unable to incorporate prior information about the latent structure to validate expert knowledge. We develop new methods that relax existing monotonicity restrictions and provide greater insight about the latent structure. Furthermore, existing Bayesian methods only use a probit link function and we provide a new formulation for using the exploratory RLCM with a logit link function that has an additional advantage of being computationally more efficient for larger sample sizes. We present four new Bayesian formulations that employ different link functions (i.e., the logit using the Pòlya–gamma data augmentation versus the probit) and priors for inducing sparsity in the latent structure. We report Monte Carlo simulation studies to demonstrate accurate parameter recovery. Furthermore, we report results from an application to the Last Series of the Standard Progressive Matrices to illustrate our new methods.
Article
Fluency tasks are among the most common item formats for the assessment of certain cognitive abilities, such as verbal fluency or divergent thinking. A typical approach to the psychometric modeling of such tasks (e.g., Intelligence, 2016, 57, 25) is the Rasch Poisson Counts Model (RPCM; Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research, 1960), in which, similarly to the assumption of (essential) ‐equivalence in Classical Test Theory, tasks have equal discriminations—meaning that, beyond varying in difficulty, they do not vary in how strongly they are related to the latent variable. In this research, we question this assumption in the case of divergent thinking tasks, and propose instead to use a more flexible 2‐Parameter Poisson Counts Model (2PPCM), which allows to characterize tasks by both difficulty and discrimination. We further propose a Bifactor 2PPCM (B2PPCM) to account for local dependencies (i.e., specific/nuisance factors) emerging from tasks sharing similarities (e.g., similar prompts and domains). We reanalyze a divergent thinking dataset (Psychology of Aesthetics, Creativity, and the Arts, 2008, 2, 68) and find the B2PPCM to significantly outperform the 2PPCM, both outperforming the RPCM. Further extensions and applications of these models are discussed.
Article
Full-text available
Raven’s Standard Progressive Matrices Raven (1941) is a widely used 60-item long measure of general mental ability. It was recently suggested that, for situations where taking this test is too time consuming, a shorter version, comprised of only the last series of the Standard Progressive Matrices (the SPM-LS; Myszkowski and Storme (2018)) could be used, while preserving satisfactory psychometric properties Garcia-Garzon et al. (2019); Myszkowski and Storme (2018). In this study, I argue, however, that some psychometric properties have been left aside by previous investigations. As part of this special issue on the reinvestigation of Myszkowski and Storme’s dataset, I propose to use the non-parametric Item Response Theory framework of Mokken Scale Analysis Mokken (1971, 1997) and its current developments Sijtsma and van der Ark (2017) to shed new light on the SPM-LS. Extending previous findings, this investigation indicated that the SPM-LS had satisfactory scalability ( H = 0 . 469 ), local independence and reliability ( M S = 0 . 841 , L C R C = 0 . 874 ). Further, all item response functions were monotonically increasing, and there was overall evidence for invariant item ordering ( H T = 0 . 475 ), supporting the Double Monotonicity Model Mokken (1997). Item 1, however, appeared problematic in most analyses. I discuss the implications of these results, notably regarding whether to discard item 1, whether the SPM-LS sum scores can confidently be used to order persons, and whether the invariant item ordering of the SPM-LS allows to use a stopping rule to further shorten test administration.
Article
Full-text available
The article provides an overview of goodness-of-fit assessment methods for item response theory (IRT) models. It is now possible to obtain accurate p-values of the overall fit of the model if bivariate information statistics are used. Several alternative approaches are described. As the validity of infer-ences drawn on the fitted model depends on the magnitude of the misfit, if the model is rejected it is necessary to assess the goodness of approximation. With this aim in mind, a class of root mean squared error of approximation (RMSEA) is described, which makes it possible to test whether the model misfit is below a specific cutoff value. Also, regardless of the outcome of the overall good-ness-of-fit assessment, a piece-wise assessment of fit should be performed to detect parts of the model whose fit can be improved. A number of statistics for this purpose are described, including a z statistic for residual means, a mean-and-variance correction to Pearson's X 2 statistic applied to each bivariate subtable separately, and the use of z statistics for residual cross-products. Item response theory (IRT) modeling involves fitting a latent variable model to discrete responses obtained from questionnaire/test items intended to measure educational achievement, personality, attitudes, and so on. As in any other modeling endeavor, after an IRT model has been fitted, it is necessary to quantify the discrepancy between the model and the data (i.e., the absolute goodness-of-fit of the model). A goodness-of-fit (GOF) index summarizes the discrepancy between the values observed in the data and the values expected under a statistical model. A goodness-of-fit statistic is a GOF index with a known sampling distribution. As such, a GOF statistic may be used to test the hypothesis of whether the fitted model could be the data-generating model. This
Article
Full-text available
Structural equation modeling (SEM) is a vast field and widely used by many applied researchers in the social and behavioral sciences. Over the years, many software pack-ages for structural equation modeling have been developed, both free and commercial. However, perhaps the best state-of-the-art software packages in this field are still closed-source and/or commercial. The R package lavaan has been developed to provide applied researchers, teachers, and statisticians, a free, fully open-source, but commercial-quality package for latent variable modeling. This paper explains the aims behind the develop-ment of the package, gives an overview of its most important features, and provides some examples to illustrate how lavaan works in practice.
Article
Full-text available
Five hundred and six first-year university students completed Raven's Advanced Progressive Matrices. Scores on Set II ranged from 6 to 35 (M= 22.17, SD = 5.60). The first 12 items of Set II were found to add little to the discriminative power of the test. Exploratory and confirmatory factor analyses failed to confirm Dillon et al.'s two-factor solution and suggested that a single-factor best represented performance on Set II. A short-form of Set II, consisting of 12 items extracted from the original 36, was developed and found to possess acceptable psychometric properties. Although this short form differed considerably in content from the short form previously devised by Arthur and Day, the two short forms did not differ with respect to concurrent validity and predictive power.
Article
Full-text available
The extent to which a scale score generalizes to a latent variable common to all of the scale's indicators is indexed by the scale's general factor saturation. Seven techniques for estimating this parameter—omegahierarchical (ωh)—are compared in a series of simulated data sets. Primary comparisons were based on 160 artificial data sets simulating perfectly simple and symmetric structures that contained four group factors, and an additional 200 artificial data sets confirmed large standard deviations for two methods in these simulations when a general factor was absent. Major findings were replicated in a series of 40 additional artificial data sets based on the structure of a real scale widely believed to contain three group factors of unequal size and less than perfectly simple structure. The results suggest that alpha and methods based on either the first unrotated principal factor or component should be rejected as estimates of ωh.
Article
Full-text available
The decision of how many factors to retain is a critical component of exploratory factor analysis. Evidence is presented that parallel analysis is one of the most accurate factor retention methods while also being one of the most underutilized in management and organizational research. Therefore, a step-by-step guide to performing parallel analysis is described, and an example is provided using data from the Minnesota Satisfaction Questionnaire. Recommendations for making factor retention decisions are discussed.
Article
Full-text available
An examinee-level (or conditional) reliability is proposed for use in both classical test theory (CTT) and item response theory (IRT). The well-known group-level reliability is shown to be the average of conditional reliabilities of examinees in a group or a population. This relationship is similar to the known relationship between the square of the conditional standard error of measurement (SEM) and the square of the group-level SEM. The proposed conditional reliability is illustrated with an empirical data set in the CTT and IRT frameworks.
Article
Full-text available
Item response theory (IRT) is widely used in assessment and evaluation research to explain how participants respond to item level stimuli. Several R packages can be used to estimate the parameters in various IRT models, the most flexible being the ltm (Ri-zopoulos 2006), eRm (Mair and Hatzinger 2007), and MCMCpack (Martin, Quinn, and Park 2011) packages. However these packages have limitations in that ltm and eRm can only analyze unidimensional IRT models effectively and the exploratory multidimensional extensions available in MCMCpack requires prior understanding of Bayesian estimation convergence diagnostics and are computationally intensive. Most importantly, multidi-mensional confirmatory item factor analysis methods have not been implemented in any R package. The mirt package was created for estimating multidimensional item response theory parameters for exploratory and confirmatory models by using maximum-likelihood meth-ods. The Gauss-Hermite quadrature method used in traditional EM estimation (e.g., Bock and Aitkin 1981) is presented for exploratory item response models as well as for confirmatory bifactor models (Gibbons and Hedeker 1992). Exploratory and confirma-tory models are estimated by a stochastic algorithm described by Cai (2010a,b). Various program comparisons are presented and future directions for the package are discussed.
Article
Full-text available
There are three fundamental problems in Sijtsma (Psychometrika, 2008): (1) contrary to the name, the glb is not the greatest lower bound of reliability but rather is systematically less than ω t (McDonald, Test theory: A unified treatment, Erlbaum, Hillsdale, 1999), (2) we agree with Sijtsma that when considering how well a test measures one concept, α is not appropriate, but recommend ω t rather than the glb, and (3) the end user needs procedures that are readily available in open source software. Keywordsreliability-internal consistency-homogeneity-test theory-coefficient alpha-coefficient omega-coefficient beta
Article
Full-text available
This paper analyzes the theoretical, pragmatic, and substantive factors that have hampered the integration between psychology and psychometrics. Theoretical factors include the operationalist mode of thinking which is common throughout psychology, the dominance of classical test theory, and the use of "construct validity" as a catch-all category for a range of challenging psychometric problems. Pragmatic factors include the lack of interest in mathematically precise thinking in psychology, inadequate representation of psychometric modeling in major statistics programs, and insufficient mathematical training in the psychological curriculum. Substantive factors relate to the absence of psychological theories that are sufficiently strong to motivate the structure of psychometric models. Following the identification of these problems, a number of promising recent developments are discussed, and suggestions are made to further the integration of psychology and psychometrics.
Article
Full-text available
The cognitive processes in a widely used, nonverbal test of analytic intelligence, the Raven Progressive Matrices Test (Raven, 1962), are analyzed in terms of which processes distinguish between higher scoring and lower scoring subjects and which processes are common to all subjects and all items on the test. The analysis is based on detailed performance characteristics, such as verbal protocols, eye-fixation patterns, and errors. The theory is expressed as a pair of computer simulation models that perform like the median or best college students in the sample. The processing characteristic common to all subjects is an incremental, reiterative strategy for encoding and inducing the regularities in each problem. The processes that distinguish among individuals are primarily the ability to induce abstract relations and the ability to dynamically manage a large set of problem-solving goals in working memory.
Article
The Raven Advanced Progressive Matrices Test (APM) is a popular measure of higher-order general mental ability (g). Its use in both basic research and applied settings is partially attributable to its apparent low level of culture-loading. However, one disadvantage curtailing its more widespread use may be its protracted administration time. Arthur and Day addressed this issue by creating an APM short form. The present study provides college-sample normative data on this 12-item short form of the APM that demonstrates psychometric properties similar to those of the long form, but with a substantially shorter administration time. These data will facilitate the use and interpretation of short form test scores in research settings where administration time is a major concern.
Article
Visual aesthetic sensitivity has been conceived as an intelligence-independent and personality-independent disposition (Frois & Eysenck, 1995). However, recent research suggests that aesthetic experience and its outcomes can be predicted by personality traits (Furnham & Chamorro-Premuzic, 2004; Furnham & Walker, 2001; McCrae, 2007; Rawlings, Barrantes-Vidal, & Furnham, 2000) and is cognitively facilitated (Leder, Belke, Oeberst, & Augustin, 2004; Reber, Schwarz, & Winkielman, 2004; Silvia, 2005, 2006; Smith & Smith, 2006). Following these new findings, three studies (the first ones in France) examined the Visual Aesthetic Sensitivity Test (Götz, Borisy, Lynn, & Eysenck, 1979; Götz, 1985) on young adult samples (Total N = 345). It was hypothesized that visual aesthetic sensitivity is related to general intelligence (study 1), specific personality traits (study 2) and figural creativity (study 3). The Visual Aesthetic Sensitivity Test was found to be predicted by intelligence (r = .27; p < .01) openness to aesthetics (r = .27; p < .01) and figural divergent thinking (r = .40; p < .001). Implications for further research are discussed.
Article
The study presents a factor analysis of the 1962 revision of the Advanced Progressive Matrices (APM). The analysis was conducted such that substantive factor structure interpretations were freed of the effects of differences in item difficulty. The APM test was given to 237 examinees, 16–18 years old. The data were subjected to a Guttman scale analysis to determine whether the APM could be interpreted as a one factor instrument. Then the phi/phi max inter-item correlation matrix was factored. A principal components analysis, followed by a series of varimax rotations of the principal components, was performed. The Guttman coefficients of scalability were too small to support a one factor theory of the APM. The 2-factor solution provided the most interpretable factor structure. Factor I was composed of items in which the solution was obtained by adding or subtracting patterns. Factor II was composed of items in which the solution was based on the ability to perceive the progression of a pattern. Results are discussed in terms of representative cognitive tests and tasks believed to embody the logical operations responsible for successful performance on items loading on each factor. The possibility of forming subtests of items to enhance the predictive validity of the matrices also is discussed.
Article
The Raven Advanced Progressive Matrices Test (APM) is a popular measure of higher-order general mental ability (g). Its use in both basic research and applied settings is partially attributable to its apparent low level of culture-loading. However, one disadvantage curtailing its more widespread use may be its protracted administration time. Arthur and Day addressed this issue by creating an APM short form. The present study provides college-sample normative data on this 12-item short form of the APM that demonstrates psychometric properties similar to those of the long form, but with a substantially shorter administration time. These data will facilitate the use and interpretation of short form test scores in research settings where administration time is a major concern.
Article
Despite the well-known theoretical advantages of item response theory (IRT) over classical test theory (CTT), research examining their empirical properties has failed to reveal consistent, demonstrable differences. Using Monte Carlo techniques with simulated test data, this study examined the behavior of item and person statistics obtained from these two measurement frameworks. The findings suggest IRT- and CTT-based item difficulty and person ability estimates were highly comparable, invariant, and accurate in the test conditions simulated. However, whereas item discrimination estimates based on IRT were accurate across most of the experimental conditions, CTT-based item discrimination estimates proved accurate under some conditions only. Implications of the results of this study for psychometric item analysis and item selection are discussed.
Article
The Raven Advanced Progressive Matrices Test (APM) is a popular measure of higher order general cognitive ability (g). Its use in both basic research and applied settings is partially attributable to its apparent low level of culture loading. However, a major drawback curtailing more widespread use is its length; the APM is a 36-item power test with an administration time of 40-60 minutes. The present study reports on the development of a 12-item short form of the APM that demonstrates psychometric properties similar to the long form, but with a substantially shorter administration time. The ultimate goal is to provide researchers and practitioners with a version of the APM that can better meet their needs by providing a sound assessment of general intelligence in a shorter time frame than is available with the present form.
Article
The Raven's Advanced Progressive Matrices (APM) has been recommended as a useful measure for identifying academic potential. Several abridged versions of the test, including Set I, have been developed as shorter screening instruments, but they await systematic evaluation. Two hundred twenty-one academically talented students (62% males), who ranged from fifth to ninth grades, completed the APM Set I and Set II. In addition, two short forms of the APM were derived using a technique described by Arthur and Day (1994). Both of these short forms had psychometric properties that were superior to those of Set I of the APM and were correlated more strongly with the full APM than Set I. The psychometric properties of the derived short forms were examined with an independent sample of students (n = 247) and found to be comparable. In addition, scores from these short forms were correlated significantly (but moderately) with independent reasoning assessments used to identify academic talent. With appropriate caution, short forms of the APM may be a reasonable alternative to the full test as a quick screening measure for identifying potentially talented students.
Article
Raw scores on the Standard and Advanced forms of the Raven Progressive Matrices were rescaled in a college sample by means of equipercentile equating to yield a common scale that accommodates a wider range of talent than do the raw scores of either form. The common scale is expressed as IQ with mean and standard deviation equated to the national normative sample for the Otis-Lennon IQ Mental Ability Test.
Article
Progressive matrices provide a nonverbal series of tests designed for measuring intelligence. The individual test was standardized on 660 children from Ipswich sampled from those born between 1924 and 1932. Subsequently, 1407 children from the same schools were given group tests. Score values are presented in the form of separate curves for the 5, 10, 25, 50, 75, 90, and 95th percentile points at half-yearly ages from 6 to 14 for the individual test and from 8 to 14 for the group test. The group test also includes percentile values for 3665 male adults. Results are compared with those of the revised Stanford Binet, but no correlations are stated. Case notes show that verbal fluency sometimes influences Binet IQ's, while not influencing matrix test scores. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
This article examines the adequacy of the “rules of thumb” conventional cutoff criteria and several new alternatives for various fit indexes used to evaluate model fit in practice. Using a 2‐index presentation strategy, which includes using the maximum likelihood (ML)‐based standardized root mean squared residual (SRMR) and supplementing it with either Tucker‐Lewis Index (TLI), Bollen's (1989) Fit Index (BL89), Relative Noncentrality Index (RNI), Comparative Fit Index (CFI), Gamma Hat, McDonald's Centrality Index (Mc), or root mean squared error of approximation (RMSEA), various combinations of cutoff values from selected ranges of cutoff criteria for the ML‐based SRMR and a given supplemental fit index were used to calculate rejection rates for various types of true‐population and misspecified models; that is, models with misspecified factor covariance(s) and models with misspecified factor loading(s). The results suggest that, for the ML method, a cutoff value close to .95 for TLI, BL89, CFI, RNI, and Gamma Hat; a cutoff value close to .90 for Mc; a cutoff value close to .08 for SRMR; and a cutoff value close to .06 for RMSEA are needed before we can conclude that there is a relatively good fit between the hypothesized model and the observed data. Furthermore, the 2‐index presentation strategy is required to reject reasonable proportions of various types of true‐population and misspecified models. Finally, using the proposed cutoff criteria, the ML‐based TLI, Mc, and RMSEA tend to overreject true‐population models at small sample size and thus are less preferable when sample size is small.
Article
Nested logit item response models for multiple-choice data are presented. Relative to previous models, the new models are suggested to provide a better approximation to multiple-choice items where the application of a solution strategy precedes consideration of response options. In practice, the models also accommodate collapsibility across all distractor categories, making it easier to allow decisions about including distractor information to occur on an item-by-item or application-by-application basis without altering the statistical form of the correct response curves. Marginal maximum likelihood estimation algorithms for the models are presented along with simulation and real data analyses. KeywordsMultiple-choice items-multiple-choice models-nested logit models-nominal response model-marginal maximum likelihood estimation-item information-distractor selection information-distractor category collapsibility
Article
The present paper reports norms for Raven's Standard Progressive Matrices (SPM) for Icelandic children of school age, 6–16 years. A total of 665 children were tested, the sample having been chosen to reflect the school age population in different parts of the country. The standardisation sample consisted of 550 children out of those tested. Norms for this group are comparable to recent norms from other countries, though showing somewhat higher scores than current norms from the UK. In the upper grades a ceiling effect becomes quite noticeable. This paper also reports a study of the SPM's validity. In Iceland, children in grades four, seven and ten are required to take national exams in Icelandic and Mathematics; additionally children from the tenth grade also take national exams in two foreign languages, English and Danish. This makes it possible to examine the criterion-related validity of the Matrices with respect to scholastic achievement. Results show that the matrices show the highest correlation with mathematics with lower correlations being found for the language subjects as expected. These correlations range from 0.38 to 0.75. While the SPM shows impressive correlations with the national examinations, testifying to the usefulness of the test as a measure of intelligence, the ceiling effect seen for the norms indicate that the test is appropriately used only with children in the first seven grades.
Article
Raven's Standard Progressive Matrices (SPM) was administered to a sample of 2735 12- to 18-year-olds in Estonia. Both a scree test and the consistent Akaike information criterion (CAIC) indicated the presence of three significant factors. Exploratory and confirmatory factor analysis showed the loadings of the items on the three factors, which were identified as the gestalt continuation found by van der Ven and Ellis [Pers. Individ. Differ. 29 (2000) 45], verbal–analytic reasoning and visuospatial ability. Further analysis of the three factors showed a higher order factor identifiable as g. Examination of age by sex differences showed that on all four factors girls performed better than boys at the age of 12, there was no sex difference at age 14, while boys performed better than girls at the age of 17, although not significantly on visuospatial ability.
Article
Two studies were performed concerned with error analyses of matrix analogy problems. In the first study information concealed in the incorrect response alternatives of the Standard Progressive Matrices (SPM) was used to find out what kinds of errors are committed when children (n = 1655, age range 8.5–12.5 years) make incorrect response choices. The error analysis of the SPM showed that omitting solution rules is a major cause of incorrect responses and that post-hoc error classification of the alternatives was problematic. In the second study, Experimental Progressive Matrices (EPM) were constructed, based on five solution rules, with an a priori notion regarding variation in rule complexity. Response alternatives were constructed thus, that from incorrect choices the number and kinds of rules omitted could be deduced. Children (n = 200, age range 8.5–12.5 years) completed the paper and pencil version of the SPM and EPM. Both tests yielded the same test scores. Errors were most often due to omitting one rule. Lower scoring children were particularly apt to omit complex rules. Further development of the EPM is needed to obtain a Raven like test which can give insight into how a child acquires a particular test score.
Article
A multivariate logistic latent trait model for items scored in two or more nominal categories is proposed. Statistical methods based on the model provide 1) estimation of two item parameters for each response alternative of each multiple choice item and 2) recovery of information from wrong responses when estimating latent ability. An application to a large sample of data for twenty vocabulary items shows excellent fit of the model according to a chi-square criterion. Item and test information curves are compared for estimation of ability assuming multiple category and dichotomous scoring of these items. Multiple scoring proves substantially more precise for subjects of less than median ability, and about equally precise for subjects above the median.
Article
Self-regulation is a complex process that involves consumers’ persistence, strength, motivation, and commitment in order to be able to override short-term impulses. In order to be able to pursue their long-term goals, consumers typically need to forgo immediate pleasurable experiences that are detrimental to reach their overarching goals. Although this sometimes involves resisting to simple and small temptations, it is not always easy, since the lure of momentary temptations is pervasive. In addition, consumers’ beliefs play an important role determining strategies and behaviors that consumers consider acceptable to engage in, affecting how they act and plan actions to attain their goals. This dissertation investigates adequacy of some beliefs typically shared by consumers about the appropriate behaviors to exert self-regulation, analyzing to what extent these indeed contribute to the enhancement of consumers’ ability to exert self-regulation.
semTools: Useful tools for structural equation modeling
  • Contributors
Contributors (2016). semTools: Useful tools for structural equation modeling. Retrieved from https://CRAN.R-project.org/package=semTools.
JASP (Version 0.8.5) [Computer software
JASP Team (2018). JASP (Version 0.8.5) [Computer software]. Retrieved from https:// jasp-stats.org/.
Raven's Standard Progressive Matrices: new school age norms and a study of the test's validity
  • J Pind
  • E K Gunnarsdóttir
  • H S Jóhannesson
Pind, J., Gunnarsdóttir, E. K., & Jóhannesson, H. S. (2003). Raven's Standard Progressive Matrices: new school age norms and a study of the test's validity. Personality and Individual Differences, 34(3), 375-386. https://doi.org/10.1016/S01918869(02)00058-2
  • N Myszkowski
N. Myszkowski, M. Storme Intelligence 68 (2018) 109-116