• Home
  • Frank Goldhammer
Frank Goldhammer

Frank Goldhammer
DIPF | Leibniz Institute for Research and Information in Education, Centre for International Student Assessment (ZIB)

Prof. Dr.

About

129
Publications
68,108
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,298
Citations
Citations since 2017
73 Research Items
1853 Citations
20172018201920202021202220230100200300400
20172018201920202021202220230100200300400
20172018201920202021202220230100200300400
20172018201920202021202220230100200300400

Publications

Publications (129)
Article
Full-text available
In large-scale assessments, disengaged participants might rapidly guess on items or skip items, which can affect the score interpretation’s validity. This study analyzes data from a linear computer-based assessment to evaluate a micro-intervention that blocked the possibility to respond for 2 s. The blocked response was implemented to prevent parti...
Article
Full-text available
Background In the context of large‐scale educational assessments, the effort required to code open‐ended text responses is considerably more expensive and time‐consuming than the evaluation of multiple‐choice responses because it requires trained personnel and long manual coding sessions. Aim Our semi‐supervised coding method eco (exploring coding...
Article
Full-text available
As researchers in the social sciences, we are often interested in studying not directly observable constructs through assessments and questionnaires. But even in a well-designed and well-implemented study, rapid-guessing behavior may occur. Under rapid-guessing behavior, a task is skimmed shortly but not read and engaged with in-depth. Hence, a res...
Article
Full-text available
Background Computer‐based assessment allows for the monitoring of reader behaviour. The identification of patterns in this behaviour can provide insights that may be useful in informing educational interventions. Objectives Our study aims to explore what different patterns of reading activity exist, and investigates their interpretation and consis...
Article
Objectives. Evaluate the block-adaptive number series task of reasoning, as a time-efficient proxy of general cognitive ability in the Level-2 sample of the German National Cohort (NAKO), a population-based mega cohort. Methods. The number series task consisted of two blocks of three items each, administered as part of the touchscreen-based assessm...
Article
Multiple document comprehension (MDC) refers to the ability to integrate information from multiple sources into a coherent representation, which requires specific cognitive processes. Assuming that epistemic beliefs are domain-related, this study investigates exploratively how epistemic beliefs in the domains of science and history affect the maste...
Article
This study aims to investigate how test scores from PIAAC (Programme for the International Assessment of Adult Competencies) can be interpreted, by comparing the PIAAC competencies literacy and numeracy to reasoning and perceptual speed. Dimensionality analyses supported, that the PIAAC competencies can be separated into a common factor overlapping...
Article
Full-text available
International large-scale assessments such as PISA or PIAAC have started to provide public or scientific use files for log data; that is, events, event-related attributes and timestamps of test-takers’ interactions with the assessment system. Log data and the process indicators derived from it can be used for many purposes. However, the intended us...
Article
Learning using the Internet has become a vital factor for academic success in higher education. Students increasingly rely on the Internet as their main information source. However, related research is still an emerging and highly fragmented field. Therefore, this study aims to provide a comprehensive and integrative review of research literature o...
Article
Full-text available
The study investigates automated and controlled cognitive processes that occur when university students read multiple documents (MDs). We examined data of 401 students dealing with two MD sets in a digital environment. Performance was assessed through several comprehension questions. Recorded log data gave indications about students’ time allocatio...
Article
In high-stakes testing, often multiple test forms are used and a common time limit is enforced. Test fairness requires that ability estimates must not depend on the administration of a specific test form. Such a requirement may be violated if speededness differs between test forms. The impact of not taking speed sensitivity into account on the comp...
Article
Full-text available
The increased availability of time‐related information as a result of computer‐based assessment has enabled new ways to measure test‐taking engagement. One of these ways is to distinguish between solution and rapid guessing behavior. Prior research has recommended response‐level filtering to deal with rapid guessing. Response‐level filtering can le...
Article
Full-text available
This paper addresses the development of performance-based assessment items for ICT skills, skills in dealing with information and communication technologies, a construct which is rather broadly and only operationally defined. Item development followed a construct-driven approach to ensure that test scores could be interpreted as intended. Specifica...
Article
Recent research suggests that readers' subjective task understanding influences reading processes and outcomes. Therefore, the present study's aim was to investigate whether the task demands that readers retrospectively report relate to multiple document comprehension strategies and outcome. A total of 310 university students completed three units...
Article
Full-text available
As Internet sources provide information of varying quality, it is an indispensable prerequisite skill to evaluate the relevance and credibility of online information. Based on the assumption that competent individuals can use different properties of information to assess its relevance and credibility, we developed the EVON (evaluation of online inf...
Article
Full-text available
Abstract As a relevant cognitive-motivational aspect of ICT literacy, a new construct ICT Engagement is theoretically based on self-determination theory and involves the factors ICT interest, Perceived ICT competence, Perceived autonomy related to ICT use, and ICT as a topic in social interaction. In this manuscript, we present different sources of...
Article
Full-text available
In this explorative study, we investigate how sequences of behaviour are related to success or failure in complex problem‐solving (CPS). To this end, we analysed log data from two different tasks of the problem‐solving assessment of the Programme for International Student Assessment 2012 study (n = 30,098 students). We first coded every interaction...
Article
Full-text available
The digital revolution has made a multitude of text documents from highly diverse perspectives on almost any topic easily available. Accordingly, the ability to integrate and evaluate information from different sources, known as multiple document comprehension, has become increasingly important. Because multiple document comprehension requires the...
Chapter
Das Kapitel gibt einen Überblick, wie mit Hilfe von Computern im weiteren Sinne Tests und Fragebogen realisiert und dabei die Möglichkeiten von klassischen Papier-und-Bleistift-Verfahren erweitert bzw. deutlich überschritten werden können. Dies betrifft beispielsweise die Entwicklung computerbasierter Items mit innovativen Antwortformaten und multi...
Chapter
In diesem Kapitel werden verschiedene Möglichkeiten beschrieben, wie ein Testergebnis bzw. Testwert deskriptivstatistisch interpretiert werden kann. Bei der normorientierten Interpretation wird der Testwert in einen Normwert transformiert, der einen Vergleich mit den Testwerten anderer Personen einer Bezugsgruppe (den „Testnormen“) erlaubt. Die Tes...
Book
Full-text available
Die vorliegende Bestandsaufnahme ist das Ergebnis einer umfassenden Recherche zum Themenfeld Digitalisierung in Schulen im Jahr 2019. Die Recherche beinhaltete die Darstellung der aktuellen rechtlichen Lage in den Bundesländern vor dem Hintergrund des beschlossenen Digitalpaktes des Bundes, einschließlich der vorhandenen Ressourcen und erkannten He...
Chapter
Full-text available
The OECD Programme for the International Assessment of Adult Competencies (PIAAC) was the first computer-based large-scale assessment to provide anonymised log file data from the cognitive assessment together with extensive online documentation and a data analysis support tool. The goal of the chapter is to familiarise researchers with how to acces...
Article
Full-text available
Rapid guessing can threaten measurement invariance and the validity of large-scale assessments, which are often conducted under low-stakes conditions. Comparing measures collected under different administration modes or in different test settings necessitates that rapid guessing rates also be comparable. Response time thresholds can be used to iden...
Chapter
From a psychometric point of view, assessment means to infer what a learner knows and can do in the real world from limited evidence observed in a standardized testing situation. From a learning analytics perspective assessment means to observe real behavior in digital learning environments to conclude the learner status with the intent to positive...
Article
Full-text available
International large-scale assessments, such as the Program for International Student Assessment (PISA), are conducted to provide information on the effectiveness of education systems. In PISA, the target population of 15-year-old students is assessed every 3 years. Trends show whether competencies have changed in the countries between PISA cycles....
Article
Das Verständnis multipler Dokumente (Multiple Document Comprehension, MDC) wird als Fähigkeit verstanden, aus verschiedenen Informationsquellen eine integrierte Repräsentation eines inhaltlichen Gegenstandsbereichs zu konstruieren. Als solche ist sie sowohl für die erfolgreiche Bewältigung eines Studiums als auch für gesellschaftliche Partizipation...
Article
Full-text available
Educational largescale assessments risk their temporal comparability when shifting from paperto computerbased assessment. A recent study showed how text responses have altered alongside PISA’s mode change, indicating mode effects. Uncertainty remained, however, because it compared students from 2012 and 2015. We aimed at reproducing the findings in...
Chapter
Multiple document comprehension is the ability to construct an integrated representation of a specific topic based on several sources. It is an important competence for university students; however, there has been so far no established instrument to assess multiple document comprehension in a standardized way. Therefore, we developed a test coverin...
Chapter
Studien wie PISA zeigen durch die wiederkehrenden Erhebungen nicht nur Moment-aufnahmen zur Leistungsfähigkeit von Bildungssystemen. Vielmehr generieren sie auch Daten über ihre Entwicklung und geben insbesondere Hinweise, ob nachfolgende Gene-rationen von Fünfzehnjährigen im Vergleich zu früheren Generationen eine höhere oder niedrigere Kompetenz...
Article
Full-text available
In large scale assessments, performance differences across different groups are regularly found. These group differences (e.g. gender differences) are often relevant for educational policy decisions and measures. However, the formation of these group differences usually remains unclear. We propose an approach for investigating this formation by con...
Presentation
Full-text available
In this study, we make first steps to combine two recently developed approaches for automatically scoring short text responses in the Programme for International Student Assessment (PISA). While PISA’s Machine-Supported Coding System (MSCS) identifies a response’s code very accurately, it is only applicable to a very limited range of new responses....
Article
The study investigates the cognitive load of students working on tasks that require the comprehension of multiple documents (Mul-tiple Document Comprehension, MDC). In a sample of 310 students, perceived task difficulty (PD) and mental effort (ME) were examined in terms of task characteristics, individual characteristics, and students' processing be...
Article
The transition from paper-based assessment (PBA) to computer-based assessment (CBA) requires mode effect studies to investigate the comparability of scores across modes. In the National Educational Panel Study experimental studies were conducted to investigate psychometric differences between modes. In the present study, the cross-mode equivalence...
Article
For many years, reading comprehension in the Programme for International Student Assessment (PISA) was measured via paper‐based assessment (PBA). In the 2015 cycle, computer‐based assessment (CBA) was introduced, raising the question of whether central equivalence criteria required for a valid interpretation of the results are fulfilled. As an exte...
Article
Full-text available
A validity approach is proposed that uses processing times to collect validity evidence for the construct interpretation of test scores. The rationale of the approach is based on current research of processing times and on classical validity approaches, providing validity evidence based on relationships with other variables. Within the new approach...
Article
Full-text available
Background: With digital technologies, competence assessments can provide process data, such as mouse clicks with corresponding timestamps, as additional information about the skills and strategies of test takers. However, in order to use variables generated from process data sensibly for educational purposes, their interpretation needs to be valid...
Article
The goal of this study was to investigate sources of evidence of convergent validity supporting the construct interpretation of scores on a simulation-based ICT skills test. The construct definition understands ICT skills as reliant on ICT-specific knowledge as well as comprehension and problem-solving skills. On the basis of this, a validity argum...
Article
Full-text available
Journal: Education Inquiry *** In 2015, the Programme for International Student Assessment (PISA) introduced multiple changes in its study design, the most extensive being the transition from paper- to computer-based assessment. We investigated the differences between German students’ text responses to eight reading items from the paper-based study...
Article
Full-text available
In this paper, we developed a method to extract item-level response times from log data that are available in computer-based assessments (CBA) and paper-based assessments (PBA) with digital pens. Based on response times that were extracted using only time differences between responses, we used the bivariate generalized linear IRT model framework (B...
Article
Full-text available
Complex problem solving (CPS) is a highly transversal competence needed in educational and vocational settings as well as everyday life. The assessment of CPS is often computer-based, and therefore provides data regarding not only the outcome but also the process of CPS. However, research addressing this issue is scarce. In this article we investig...
Chapter
Full-text available
Many large-scale competence assessments such as the National Educational Panel Study (NEPS) have introduced novel test designs to improve response rates and measurement precision. In particular, unstandardized online assessments (UOA) offer an economic approach to reach heterogeneous populations that otherwise would not participate in face-to-face...
Article
A new response time-based method for coding omitted item responses in computer-based testing is introduced and illustrated with empirical data. The new method is derived from the theory of missing data problems of Rubin and colleagues and embedded in an item response theory framework. Its basic idea is using item response times to statistically tes...
Presentation
Full-text available
Das Programme for International Student Assessment (PISA) hat 2015 mehrere Änderungen im Studiendesign vorgenommen, wovon der Wechsel von der papier- zur computerbasierten Erhebung die umfassendste ist. Diese Studie untersucht Unterschiede in offenen Textantworten der Schülerinnen und Schüler auf acht Leseitems zwischen der papierbasierten Studie v...
Presentation
Full-text available
A popular definition describes learning analytics as measuring, collecting, analyzing and reporting of data about learners. The main purpose thereof is to understand and support learning processes. Thus, the main research goals of learning analytics remarkably overlap with those of educational assessment and psychometrics. To demonstrate how these...
Article
Full-text available
Log data from educational assessments attract more and more attention and large-scale assessment programs have started providing log data as scientific use files. Such data generated as a by-product of computer-assisted data collection has been known as paradata in survey research. In this paper, we integrate log data from educational assessments i...
Article
Full-text available
Background: The gender gap in reading literacy is repeatedly found in large-scale assessments. This study compared girls’ and boys’ text responses in a reading test applying natural language processing. For this, a theoretical framework was compiled that allows mapping of response features to the preceding cognitive components such as micro- and ma...
Article
Full-text available
A critical evaluation of results to find useful information is essential when doing a web search. In this study, we investigated the evaluation skills of secondary school students, based on their behavior in selecting links from a search engine result page (SERP). To clarify the role of reading when evaluating online information, we assessed studen...
Article
Full-text available
** Excerpt: ** In cognitive ability testing, process data can be defined as empirical information about the cognitive (as well as meta-cognitive, motivational, and affective) states and related behavior that mediate the effect of the measured construct(s) on the task product (i.e., item score). Thus, operationally, process data can be regarded as t...
Article
Full-text available
Receiving and using web-based information has become part of everyday life, but the non-linear presentation of information can make considerable demands on cognitive resources, affecting text comprehension. This study examined whether memory updating predicts students' comprehension of digital hypertext over and above skills in reading linearly str...
Article
Full-text available
Abstract Background A potential problem of low-stakes large-scale assessments such as the Programme for the International Assessment of Adult Competencies (PIAAC) is low test-taking engagement. The present study pursued two goals in order to better understand conditioning factors of test-taking disengagement: First, a model-based approach was used...
Article
The combination of different item formats is found quite often in large scale assessments, and analyses on the dimensionality often indicate multi-dimensionality of tests regarding the task format. In ICILS 2013, three different item types (information-based response tasks, simulation tasks, and authoring tasks) were used to measure computer and in...
Presentation
Full-text available
This study analyzed student text responses to a reading test using natural language processing techniques. Focusing on semantic response features, it investigated (i) the reading gender gap and (ii) trends in the Programme for International Student Assessment (PISA) from 2012 to 2015 alongside the change from paper-based to computer-based assessmen...
Presentation
Full-text available
The automatic coding of open-ended text responses overcomes some classical problems associated with human coders. While different research groups (e.g., Leacock & Chodorow, 2003; Smarter Balanced Assessment Consortium, 2014) have shown that computers can achieve performance levels similar to humans in scoring short-text responses, the next step is...
Article
Full-text available
Computer-based assessments open up new possibilities to measure constructs in authentic settings. They are especially promising to measure 21st century skills, as for instance information and communication technologies (ICT) skills. Items tapping such constructs may be diverse regarding design principles and content and thus form a heterogeneous it...
Article
Completing test items under multiple speed conditions avoids that the performance measure is confounded with individual differences in the speed-accuracy compromise, and offers insights into the response process, that is, how response time relates to the probability of a correct response. This relation is traditionally represented by two conceptual...
Article
With the aim to better understand the nature of complex problem solving (CPS), we investigated the link between confidence judgments, which represent a major constituent of metacognitive self-monitoring, and CPS by regressing the two facets of CPS (i.e., knowledge acquisition and knowledge application) on confidence in CPS. To ensure that the link...
Article
Im vorliegenden Beitrag wurde die Entwicklung der Lesekompetenz im letzten Abschnitt der Sekundarstufe I (Klassenstufen 9 bis 10) untersucht. Neben der Veränderung der Testleistungen in der Gesamtpopulation wurden die Assoziationen ausgewählter institutioneller (Schulform), familiärer (Zuwanderungshintergrund und sozioökonomischer familiärer Status...
Article
Die Studie untersucht Zusammenhänge zwischen dem Leseverständnis und basalen Prozessen des Leseverstehens auf Wort- und Satzebene sowie des Arbeitsgedächtnisses 15-jähriger Jugendlicher. Es wurde den Fragen nachgegangen, ob Unterschiede in der Effizienz der betrachteten Teilkomponenten zum einen die Lesekompetenz selbst, zum anderen Veränderungen i...
Article
Full-text available
The effects of aging on response time were examined in a paper-based lexical-decision experiment with younger (age 18-36) and older (age 64-75) adults, applying Ratcliff's diffusion model. Using digital pens allowed the paper-based assessment of response times for single items. Age differences previously reported by Ratcliff and colleagues in compu...
Chapter
Full-text available
Competency measurement typically focuses on task outcomes. Taking process data into account (i.e., processing time and steps) can provide new insights into construct-related solution behavior, or confirm assumptions that govern task design. This chapter summarizes four studies to illustrate the potential of behavioral process data for explaining ta...
Conference Paper
Die Fähigkeit komplexe Probleme lösen zu können ist eine grundlegende Kompetenz in Bildung und Alltag, und ermöglicht eine aktive Teilhabe an der Gesellschaft. So konfrontiert beispielsweise das Fortschreiten von Globalisierung und Technisierung die Menschen mit einer immer komplexeren Umwelt (Fischer et al., 2012). Gleichzeitig ist Problemlösen Gr...
Presentation
Full-text available
Mittels Technologien der natürlichen Sprachverarbeitung untersucht die Studie Textantworten deutscher Schülerinnen und Schüler auf Leseaufgaben im Programme for International Student Assessment (PISA), um Unterschiede zwischen (a) den Geschlechtern und (b) den Erhebungsrunden zu beleuchten. Da der Administrationsmodus für die Lesekompetenz in PISA...
Article
Full-text available
Time-on-task effects on response accuracy in digital reading tasks were examined using PISA 2009 data (N=34,062, 19 countries/economies). As a baseline, task responses were explained by time on task, tasks’ easiness, and persons’ digital reading skill (Model 1). Model 2 added a quadratic time-on-task effect, persons’ comprehension skill and tasks’...
Chapter
Learning throughout the life span relies more and more on using information and communication technology (ICT) to acquire new knowledge and skills in both formal and informal learning environments. Thus, learning to use ICT and using ICT to learn have become major premises for successful participation in educational, professional, social, cultural,...
Article
Full-text available
Zusammenfassung. Internationale Schulleistungsstudien wie das Programme for International Student Assessment (PISA) dienen den teilnehmenden Landern zur Feststellung der Leistungsfahigkeit ihrer Schulsysteme. In PISA wird die Zielpopulation (15-jahrige Schulerinnen und Schuler) alle 3 Jahre getestet. Von besonderer Bedeutung sind dabei die Trendinf...
Article
Full-text available
This paper provides an overview and recommendations on how to conduct a mode effect study in large-scale assessments by addressing criteria of equivalence between paper-based and computer-based tests. These criteria are selected according to the intended use of test scores and test score interpretations. A mode effect study can be implemented using...