About
66
Publications
63,530
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,161
Citations
Citations since 2017
Publications
Publications (66)
Background
Mode effects, the variations in item and scale properties attributed to the mode of test administration (paper vs. computer), have stimulated research around test equivalence and trend estimation in PISA. The PISA assessment framework provides the backbone to the interpretation of the results of the PISA test scores. However, an identifi...
In large-scale assessments, disengaged participants might rapidly guess on items or skip items, which can affect the score interpretation’s validity. This study analyzes data from a linear computer-based assessment to evaluate a micro-intervention that blocked the possibility to respond for 2 s. The blocked response was implemented to prevent parti...
As researchers in the social sciences, we are often interested in studying not directly observable constructs through assessments and questionnaires. But even in a well-designed and well-implemented study, rapid-guessing behavior may occur. Under rapid-guessing behavior, a task is skimmed shortly but not read and engaged with in-depth. Hence, a res...
Background
Computer‐based assessment allows for the monitoring of reader behaviour. The identification of patterns in this behaviour can provide insights that may be useful in informing educational interventions.
Objectives
Our study aims to explore what different patterns of reading activity exist, and investigates their interpretation and consis...
Objectives. Evaluate the block-adaptive number series task of reasoning, as a time-efficient proxy of general cognitive ability in the Level-2 sample of the German National Cohort (NAKO), a population-based mega cohort.
Methods. The number series task consisted of two blocks of three items each, administered as part of the touchscreen-based assessm...
In this article the affiliation details for Author A were incorrectly given as ‘EDUCATIONAL MEASUREMENT’ but should have been ‘IPN–Leibniz Institute for Science and Mathematics Education’.
Multiple document comprehension (MDC) refers to the ability to integrate information
from multiple sources into a coherent representation, which requires specific cognitive
processes. Assuming that epistemic beliefs are domain-related, this study investigates
exploratively how epistemic beliefs in the domains of science and history affect the
maste...
Careless and insufficient effort responding (C/IER) can pose a major threat to data quality and, as such, to validity of inferences drawn from questionnaire data. A rich body of methods aiming at its detection has been developed. Most of these methods can detect only specific types of C/IER patterns. However, typically different types of C/IER patt...
International large-scale assessments such as PISA or PIAAC have started to provide public or scientific use files for log data; that is, events, event-related attributes and timestamps of test-takers’ interactions with the assessment system. Log data and the process indicators derived from it can be used for many purposes. However, the intended us...
The study investigates automated and controlled cognitive processes that occur when university students read multiple documents (MDs). We examined data of 401 students dealing with two MD sets in a digital environment. Performance was assessed through several comprehension questions. Recorded log data gave indications about students’ time allocatio...
The increased availability of time‐related information as a result of computer‐based assessment has enabled new ways to measure test‐taking engagement. One of these ways is to distinguish between solution and rapid guessing behavior. Prior research has recommended response‐level filtering to deal with rapid guessing. Response‐level filtering can le...
Recent research suggests that readers' subjective task understanding influences reading processes and outcomes. Therefore, the present study's aim was to investigate whether the task demands that readers retrospectively report relate to multiple document comprehension strategies and outcome. A total of 310 university students completed three units...
The digital revolution has made a multitude of text documents from highly diverse perspectives on almost any topic easily available. Accordingly, the ability to integrate and evaluate information from different sources, known as multiple document comprehension, has become increasingly important. Because multiple document comprehension requires the...
By tailoring test forms to the test‐taker's proficiency, Computerized Adaptive Testing (CAT) enables substantial increases in testing efficiency over fixed forms testing. When used for formative assessment, the alignment of task difficulty with proficiency increases the chance that teachers can derive useful feedback from assessment data. The appli...
Das Kapitel gibt einen Überblick, wie mit Hilfe von Computern im weiteren Sinne Tests und Fragebogen realisiert und dabei die Möglichkeiten von klassischen Papier-und-Bleistift-Verfahren erweitert bzw. deutlich überschritten werden können. Dies betrifft beispielsweise die Entwicklung computerbasierter Items mit innovativen Antwortformaten und multi...
The OECD Programme for the International Assessment of Adult Competencies (PIAAC) was the first computer-based large-scale assessment to provide anonymised log file data from the cognitive assessment together with extensive online documentation and a data analysis support tool. The goal of the chapter is to familiarise researchers with how to acces...
Rapid guessing can threaten measurement invariance and the validity of large-scale assessments, which are often conducted under low-stakes conditions. Comparing measures collected under different administration modes or in different test settings necessitates that rapid guessing rates also be comparable. Response time thresholds can be used to iden...
International large-scale assessments, such as the Program for International Student Assessment (PISA), are conducted to provide information on the effectiveness of education systems. In PISA, the target population of 15-year-old students is assessed every 3 years. Trends show whether competencies have changed in the countries between PISA cycles....
Das Verständnis multipler Dokumente (Multiple Document Comprehension, MDC) wird als Fähigkeit verstanden, aus verschiedenen Informationsquellen eine integrierte Repräsentation eines inhaltlichen Gegenstandsbereichs zu konstruieren. Als solche ist sie sowohl für die erfolgreiche Bewältigung eines Studiums als auch für gesellschaftliche Partizipation...
Educational largescale assessments risk their temporal comparability when shifting from paperto computerbased assessment. A recent study showed how text responses have altered alongside PISA’s mode change, indicating mode effects. Uncertainty remained, however, because it compared students from 2012 and 2015. We aimed at reproducing the findings in...
Multiple document comprehension is the ability to construct an integrated representation of a specific topic based on several sources. It is an important competence for university students; however, there has been so far no established instrument to assess multiple document comprehension in a standardized way. Therefore, we developed a test coverin...
Studien wie PISA zeigen durch die wiederkehrenden Erhebungen nicht nur Moment-aufnahmen zur Leistungsfähigkeit von Bildungssystemen. Vielmehr generieren sie auch Daten über ihre Entwicklung und geben insbesondere Hinweise, ob nachfolgende Gene-rationen von Fünfzehnjährigen im Vergleich zu früheren Generationen eine höhere oder niedrigere Kompetenz...
The study investigates the cognitive load of students working on tasks that require the comprehension of multiple documents (Mul-tiple Document Comprehension, MDC). In a sample of 310 students, perceived task difficulty (PD) and mental effort (ME) were examined in terms of task characteristics, individual characteristics, and students' processing be...
The transition from paper-based assessment (PBA) to computer-based assessment (CBA) requires mode effect studies to investigate the comparability of scores across modes. In the National Educational Panel Study experimental studies were conducted to investigate psychometric differences between modes. In the present study, the cross-mode equivalence...
For many years, reading comprehension in the Programme for International Student Assessment (PISA) was measured via paper‐based assessment (PBA). In the 2015 cycle, computer‐based assessment (CBA) was introduced, raising the question of whether central equivalence criteria required for a valid interpretation of the results are fulfilled. As an exte...
Background:
With digital technologies, competence assessments can provide process data, such as mouse clicks with corresponding timestamps, as additional information about the skills and strategies of test takers. However, in order to use variables generated from process data sensibly for educational purposes, their interpretation needs to be valid...
In this paper, we developed a method to extract item-level response times from log data that are available in computer-based assessments (CBA) and paper-based assessments (PBA) with digital pens. Based on response times that were extracted using only time differences between responses, we used the bivariate generalized linear IRT model framework (B...
Many large-scale competence assessments such as the National Educational Panel Study (NEPS) have introduced novel test designs to improve response rates and measurement precision. In particular, unstandardized online assessments (UOA) offer an economic approach to reach heterogeneous populations that otherwise would not participate in face-to-face...
A popular definition describes learning analytics as measuring, collecting, analyzing and reporting of data about learners. The main purpose thereof is to understand and support learning processes. Thus, the main research goals of learning analytics remarkably overlap with those of educational assessment and psychometrics. To demonstrate how these...
Log data from educational assessments attract more and more attention and large-scale assessment programs have started providing log data as scientific use files. Such data generated as a by-product of computer-assisted data collection has been known as paradata in survey research. In this paper, we integrate log data from educational assessments i...
The shadow testing approach (STA; van der Linden & Reese, 1998) is considered the state of the art in constrained item selection for computerized adaptive tests. The present paper shows that certain types of constraints (e.g., bounds on categorical item attributes) induce a matroid on the item bank. This observation is used to devise item selection...
The behavioral sciences, including most of psychology, seek to explain and predict behavior with the help of theories and models that involve concepts (e.g., attitudes) that are subsequently translated into measures. Currently, some subdisciplines such as social psychology focus almost exclusively on measures that demand reflection or even introspe...
A critical evaluation of results to find useful information is essential when doing a web search. In this study, we investigated the evaluation skills of secondary school students, based on their behavior in selecting links from a search engine result page (SERP). To clarify the role of reading when evaluating online information, we assessed studen...
Receiving and using web-based information has become part of everyday life, but the non-linear presentation of information can make considerable demands on cognitive resources, affecting text comprehension. This study examined whether memory updating predicts students' comprehension of digital hypertext over and above skills in reading linearly str...
Completing test items under multiple speed conditions avoids that the performance measure is confounded with individual differences in the speed-accuracy compromise, and offers insights into the response process, that is, how response time relates to the probability of a correct response. This relation is traditionally represented by two conceptual...
Die Studie untersucht Zusammenhänge zwischen dem Leseverständnis und basalen Prozessen des Leseverstehens auf Wort- und Satzebene sowie des Arbeitsgedächtnisses 15-jähriger Jugendlicher. Es wurde den Fragen nachgegangen, ob Unterschiede in der Effizienz der betrachteten Teilkomponenten zum einen die Lesekompetenz selbst, zum anderen Veränderungen i...
The effects of aging on response time were examined in a paper-based lexical-decision experiment with younger (age 18-36) and older (age 64-75) adults, applying Ratcliff's diffusion model. Using digital pens allowed the paper-based assessment of response times for single items. Age differences previously reported by Ratcliff and colleagues in compu...
In mathematics education, the student’s ability to translate between different representations of functions is regarded as a key competence for mastering situations that can be described by mathematical functions. Students are supposed to interpret common representations like numerical tables (N), function graphs (G), verbally or pictorially repres...
Even though multidimensional adaptive testing (MAT) is advantageous in the measurement of complex competences, operational applications are still rare. In an attempt to change this situation, this chapter presents four recent developments that foster the applicability of MAT. First, in a simulation study, we show that multiple constraints can be ac...
Zusammenfassung. Internationale Schulleistungsstudien wie das Programme for International Student Assessment (PISA) dienen den teilnehmenden Landern zur Feststellung der Leistungsfahigkeit ihrer Schulsysteme. In PISA wird die Zielpopulation (15-jahrige Schulerinnen und Schuler) alle 3 Jahre getestet. Von besonderer Bedeutung sind dabei die Trendinf...
This paper provides an overview and recommendations on how to conduct a mode effect study in large-scale assessments by addressing criteria of equivalence between paper-based and computer-based tests. These criteria are selected according to the intended use of test scores and test score interpretations. A mode effect study can be implemented using...
We present data-driven log file analyses of an electronic text book for history called the mBook to support teachers in preparing lessons for their students. We represent user sessions as contextualised Markov processes of user sessions and propose a probabilistic clustering using expectation maximisation to detect groups of similar (i) sessions an...
Reading and understanding digital text that is organized in a non-linear hypertext format can be challenging for students as it requires a more self-directed selection of text pieces compared to reading linear texts. This study aims at investigating how individual differences in students' skills in comprehending digital text can be explained by the...
The use of Information and Communication Technology (ICT) is of immense importance in today’s digital knowledge society. As a basis for private and vocational participation in society, ICT literacy has been widely discussed in recent decades. Although motivational and metacognitive facets play an important role in developing ICT literacy and compet...
Multidimensional adaptive testing (MAT) can improve the efficiency of measuring traits that are
known to be highly correlated. Content balancing techniques can ensure that tests fulfill requirements
with respect to content areas, such as the number of items from various dimensions (target
rates). However, content balancing does not restrict the ord...
The Rasch-based, computerized adaptive assessment procedure RehaCAT allows to assess the ICF-oriented dimensions "activities in daily living", "functionality upper extremities" and "functionality lower extremities" as well as "depression" economically and reliably in orthopaedic rehabilitation patients. This validation study aimed at analyzing the...
The speed-ability trade-off becomes a measurement problem if there is between-subject variation in the speed-ability compromise, as this may affect the comparability of ability estimates. To control individual speed differences, the response-signal (RS) paradigm was applied requiring an immediate response as soon as an acoustic signal is presented....
Multidimensional adaptive testing (MAT) can improve the efficiency of measuring traits that are known to be highly correlated. Content balancing techniques can ensure that tests fulfill requirements with respect to content areas, such as the number of items from various dimensions (target rates). However, content balancing does not restrict the ord...
ICT-Literacy legt eine performanzbasierte Erfassung nahe, also mithilfe von Testaufgaben, die interaktive (simulierte) Computerumgebungen prasentieren und eine Reaktion mittels Maus und/oder Tastatur erfordern. Dennoch kommen haufig Verfahren wie Selbstbeurteilungen oder papierbasierte Leistungstests zum Einsatz. Ziel der vorliegenden Studie war es...
This study aimed at confirmatory testing the factorial structure of the established assessment instruments ODI, SF-12 and HADS-D by means of structural equation modeling in a sample of n=184 rehabilitation patients with musculo-skeletal diseases. According to local and global fit indices for each instrument an acceptable to good fit to the underlyi...
This study conducted a simulation study for computer-adaptive testing based on the Aachen Depression Item Bank (ADIB), which was developed for the assessment of depression in persons with somatic diseases. Prior to computer-adaptive test simulation, the ADIB was newly calibrated.
Recalibration was performed in a sample of 161 patients treated for a...
In the field of competence diagnostics adaptive testing is considered as an optimal approach for a highly efficient and economic measurement. As a prerequisite, items have to be part of a calibrated item pool and need to be administered in a computerized testing format. Thus, if one wants to benefit from using a computerized-adaptive testing (CAT)...
To develop and evaluate a computer-adaptive test for the assessment of anxiety in cardiovascular rehabilitation patients (ACAT-cardio) that tailors an optimal test for each patient and enables precise and time-effective measurement.
Simulation study, validation study (against the anxiety subscale of the Hospital Anxiety and Depression Scale (HADS-A...
In der Kompetenzdiagnostik gelten adaptive Testprozeduren als eine optimale Methode, um sowohl psychometrische Gütestandards zu erfüllen als auch Aspekte der Praktikabilität angemessen zu berücksichtigen. Eine wichtige Voraussetzung für adaptives Testen ist dabei, dass die Items einer kalibrierten Itembank entstammen und in einem computerisierten T...
For diagnostics and outcome measurement in clinical rehabilitation a multitude of questionnaires is used. In order to gain comparability of the diagnostic findings, generally, the same information is gathered of all patients, regardless of their state of health or how severely ill they are, by using identical groups of items. In this kind of assess...
Computerized competence tests promise a variety of advantages compared to paper pencil delivered tests, for instance, increased test security, more information about test takers and the test-taking process, instant scoring, and immediate feedback. Moreover, new innovative item types can be administered to broaden the test content. Three benefits sh...
During the last two decades, Structural Equation Modeling (SEM) has evolved from a statistical technique for insiders to an established valuable tool for a broad scientific public. This class of analyses has much to offer, but at what price? This paper pro- vides an overview on SEM, its underlying ideas, potential applications and current software....
Projects
Projects (7)
Das „Programme for International Student Assessment“ (PISA) erfasst weltweit Schülerleistungen und vergleicht diese international.
Die drei untersuchten Kompetenzbereiche in Naturwissenschaft, Lesen und Mathematik sind ein zentraler Bestandteil lebenslangen Lernens. PISA stellt das Leistungsniveau der Jugendlichen fest, liefert Informationen über Ergebnisse des Lehrens und Lernens in den Schulen und zeigt Entwicklungen im Bildungssystem auf. Dabei ist weniger die Übereinstimmung der Testaufgaben mit den Lehrplänen der teilnehmenden Länder von Bedeutung als die Erfassung von Basiskompetenzen in verschiedenen Anwendungssituationen. Das Grundbildungskonzept, von dem PISA ausgeht, ist also funktionalistisch: 15-jährige Schülerinnen und Schüler sollen in möglichst authentischen Aufgaben ihre in der Schule erworbenen Kompetenzen anwenden. Bei PISA 2018 werden nach 2000 und 2009 zum dritten Mal die Lesekompetenzen der 15-jährigen Schülerinnen und Schüler als Schwerpunkt getestet.
Implementation of software that can be used to analyze log file data form educational large-scale assessments using the method described in:
Kroehne, U., & Goldhammer, F. (2018). How to conceptualize, represent, and analyze log data from technology-based assessments? A generic framework and an application to questionnaire items. *Behaviormetrika*, 45 (2), 527–563. https://doi.org/10.1007/s41237-018-0063-y