Figure - available via license: Creative Commons Attribution 4.0 International
Content may be subject to copyright.
Source publication
The Mega and Titan Tests were designed by Ronald K. Hoeflin to make fine distinctions in the intellectual stratosphere. The Mega Test purported to measure above-average adult IQ up to and including scores with a rarity of one in a million of the general population. The Titan Test was billed as being even more difficult than the Mega Test. In this a...
Context in source publication
Citations
... Upon trying out ChatGPT on other types of tests, it was observed that ChatGPT is good at verbal analogies, as are often used in IQ tests. For example, ChatGPT correctly answered between 7 and 9 of 24 verbal analogy questions on the highly challenging Titan test (Hoeflin, 1990), which indicates that ChatGPT has a high verbal IQ (for norming, see Redvaldsen, 2020). Similarly, the blogger Pumpkin Person found that ChatGPT performed well on a self-created verbal intelligence test (Pumpkin Person, 2022). ...
Launched in late November 2022, ChatGPT, a large language model chatbot, has garnered considerable attention. However, ongoing questions remain regarding its capabilities. In this study, ChatGPT was used to complete national high school exams in the Netherlands on the topic of English reading comprehension. In late December 2022, we submitted the exam questions through the ChatGPT web interface (GPT-3.5). According to official norms, ChatGPT achieved a mean grade of 7.3 on the Dutch scale of 1 to 10—comparable to the mean grade of all students who took the exam in the Netherlands, 6.99. However, ChatGPT occasionally required re-prompting to arrive at an explicit answer; without these nudges, the overall grade was 6.5. In March 2023, API access was made available, and a new version of ChatGPT, GPT-4, was released. We submitted the same exams to the API, and GPT-4 achieved a score of 8.3 without a need for re-prompting. Additionally, employing a bootstrapping method that incorporated randomness through ChatGPT’s ‘temperature’ parameter proved effective in self-identifying potentially incorrect answers. Finally, a re-assessment conducted with the GPT-4 model updated as of June 2023 showed no substantial change in the overall score. The present findings highlight significant opportunities but also raise concerns about the impact of ChatGPT and similar large language models on educational assessment.
Intelligence is the most studied construct in psychology and cognitive neuroscience. In Brazil, the administration of intelligence tests is needed for a number of social rights, including driving privileges. Such requirements have led to a large testing industry but the vast majority of intelligence tests require extended administration times and language skills. In this study, we sought to investigate the psychometric properties and normative results of a new non-verbal intelligence test, the General Matrix of Intelligence (GMI). The GMI is comprised of 28 matrix-based items and can be administered in as little as six-minutes. In this initial pilot test, the GMI was administered to 1,326 participants, ages 15-64 years old (M = 25.65 years, SD = 9.6 years), from all regions in Brazil. These data were analyzed using a 2PL Item Response Theory model, regression analyses were conducted to determine the role of sociodemographic factors, and preliminary norms were computed. Results indicated a unidimensional solution that reproduced the g factor theory, invariance across genders, evidence that cognitively demanding items involving movement or three-dimensional shapes were more difficult than items with less cognitive load, a normal distribution for results, and an interaction between education level and age group in predicting performance. Implications of these findings for research and practice are discussed and all data and codes are provided at https://osf.io/kvu42/