
Fabian Zehner- PhD
- Psychometrician at DIPF | Leibniz Institute for Research and Information in Education
Fabian Zehner
- PhD
- Psychometrician at DIPF | Leibniz Institute for Research and Information in Education
About
49
Publications
14,859
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
364
Citations
Introduction
Fabian Zehner currently works at the Centre for Technology Based Assessment (TBA), DIPF | Leibniz Institute for Research and Information in Education.
Fabian does research in Psychometrics and Technologies in Educational Assessment. His main research interest lies in automatically processing open-ended text responses.
Current institution
DIPF | Leibniz Institute for Research and Information in Education
Current position
- Psychometrician
Publications
Publications (49)
Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the Programme for Int...
** Excerpt: **
In cognitive ability testing, process data can be defined as empirical information about the cognitive (as well as meta-cognitive, motivational, and affective) states and related behavior that mediate the effect of the measured construct(s) on the task product (i.e., item score). Thus, operationally, process data can be regarded as t...
Contributions in the Special Issue
The special issue assembles papers centring around log data analysis, natural language processing, and machine learning used to advance educational assessment. They demonstrate how semi‐ and unstructured data such as log and text data can, despite their challenging nature, be handled appropriately to benefit educa...
In this article, we systematize the factors influencing performance and feasibility of automatic content scoring methods for short text responses. We argue that performance (i.e., how well an automatic system agrees with human judgments) mainly depends on the linguistic variance seen in the responses and that this variance is indirectly influenced...
Background
In the context of large‐scale educational assessments, the effort required to code open‐ended text responses is considerably more expensive and time‐consuming than the evaluation of multiple‐choice responses because it requires trained personnel and long manual coding sessions.
Aim
Our semi‐supervised coding method eco (exploring coding...
The cover image is based on the Editorial Artificial Intelligence on the Advance to Enhance Educational Assessment: Scientific Clickbait or Genuine Gamechanger? by Fabian Zehner and Carolin Hahnel, https://doi.org/10.1111/jcal.12810 . image
International large-scale assessments such as PISA or PIAAC have started to provide public or scientific use files for log data; that is, events, event-related attributes and timestamps of test-takers’ interactions with the assessment system. Log data and the process indicators derived from it can be used for many purposes. However, the intended us...
The NAEP EDM Competition required participants to predict efficient test-taking behavior based on log data. This paper describes our top-down approach for engineering features by means of psychometric modeling, aiming at machine learning for the predictive classification task. For feature engineering, we employed, among others, the Log-Normal Respo...
In this paper, we introduce shinyReCoR: a new app that utilizes a cluster-based method for automatically coding open-ended text responses. Reliable coding of text responses from educational or psychological assessments requires substantial organizational and human effort. The coding of natural language in responses to tests depends on the texts’ co...
Background
With the onset of the COVID-19 pandemic at the beginning of 2020, the crucial role of hygiene in healthcare settings has once again become very clear. For diagnostic and for didactic purposes, standardized and reliable tests suitable to assess the competencies involved in “working hygienically” are required. However, existing tests usual...
Especially in university hospitals, many physicians have to fulfil multiple roles as they treat patients, conduct research and act as clinical teachers. The present study focuses upon the latter role and analyses which attitudes and motivational patterns guide physicians in their teaching activities. With regard to motivation, we draw on self-deter...
The 2nd Annual WPI-UMASS-UPENN EDM Data Mining Challenge required contestants to predict efficient test-taking based on log data. In this paper, we describe our theory-driven and psychometric modeling approach. For feature engineering, we employed the Log-Normal Response Time Model for estimating latent person speed, and the Generalized Partial Cre...
Background: Since the onset of the Corona pandemic at the beginning of 2020, the extreme importance of hygiene has once again become very clear. In the medical context, it is not easy to find suitable test formats to assess the competencies involved in “working hygienically”. Pre-existing test formats usually use self-report questionnaires, which a...
Educational largescale assessments risk their temporal comparability when shifting from paperto computerbased assessment. A recent study showed how text responses have altered alongside PISA’s mode change, indicating mode effects. Uncertainty remained, however, because it compared students from 2012 and 2015. We aimed at reproducing the findings in...
In this study, we make first steps to combine two recently developed approaches for automatically scoring short text responses in the Programme for International Student Assessment (PISA). While PISA’s Machine-Supported Coding System (MSCS) identifies a response’s code very accurately, it is only applicable to a very limited range of new responses....
Focusing on Germany, this article presents results from the international comparison of fifteen-year-olds in collaborative problem solving and a cross validation of the scaling in the Programme for International Student Assessment (PISA) 2015. A new computer-based test was used requesting students to solve a problem jointly with simulated group mem...
Was kann künstliche Intelligenz wirklich? Und wie können wir sie gewinnbringend im Bildungssektor einsetzen? Sollten wir Angst davor haben, dass der Klassenlehrer unserer Enkelkinder in wenigen Jahrzehnten eduBot™ heißen könnte? Dieser Beitrag beleuchtet anhand verschiedener Anwendungsbeispiele, welches Potenzial tatsächlich hinter künstlicher Inte...
Journal: Education Inquiry
***
In 2015, the Programme for International Student Assessment (PISA) introduced multiple changes in its study design, the most extensive being the transition from paper- to computer-based assessment. We investigated the differences between German students’ text responses to eight reading items from the paper-based study...
In this talk, we report results from the Collaborative Problem Solving (CPS) domain in the Programme for International Student Assessment (PISA) 2015 on the international level with a focus on Germany. We present additional national data from the German PISA sample, cross-validate the IRT scaling using an independent sample of ninth graders in Germ...
Collaborative problem solving (CPS) as the innovative domain in PISA 2015 is defined as a critical skill in education and the workforce where individuals solve problems together by combining their understanding, effort and work (OECD, 2017a; Assessment Framework). The CPS test is based on simulated conversations with computer-based agents. Students...
Das Programme for International Student Assessment (PISA) hat 2015 mehrere Änderungen im Studiendesign vorgenommen, wovon der Wechsel von der papier- zur computerbasierten Erhebung die umfassendste ist. Diese Studie untersucht Unterschiede in offenen Textantworten der Schülerinnen und Schüler auf acht Leseitems zwischen der papierbasierten Studie v...
Background: The gender gap in reading literacy is repeatedly found in large-scale assessments. This study compared girls’ and boys’ text responses in a reading test applying natural language processing. For this, a theoretical framework was compiled that allows mapping of response features to the preceding cognitive components such as micro- and ma...
This study analyzed student text responses to a reading test using natural language processing techniques. Focusing on semantic response features, it investigated (i) the reading gender gap and (ii) trends in the Programme for International Student Assessment (PISA) from 2012 to 2015 alongside the change from paper-based to computer-based assessmen...
The automatic coding of open-ended text responses overcomes some classical problems associated with human coders. While different research groups (e.g., Leacock & Chodorow, 2003; Smarter Balanced Assessment Consortium, 2014) have shown that computers can achieve performance levels similar to humans in scoring short-text responses, the next step is...
Mittels Technologien der natürlichen Sprachverarbeitung untersucht die Studie Textantworten deutscher Schülerinnen und Schüler auf Leseaufgaben im Programme for International Student Assessment (PISA), um Unterschiede zwischen (a) den Geschlechtern und (b) den Erhebungsrunden zu beleuchten. Da der Administrationsmodus für die Lesekompetenz in PISA...
Die sogenannten Hintergrund-Fragebögen haben in den PISA-Erhebungen wachsende Bedeutung. Die Auskünfte von Lernenden, Lehrkräften, Schulleitungen und Eltern werden benötigt, um den sozialen Hintergrund, die Zuwanderungsgeschichte und den Bildungsverlauf zu rekonstruieren sowie Lehr-Lernprozesse, schulische Rahmenbedingungen und die Steuerung des Sc...
Die Ergebnisse der PISA-Studie 2015 zeigen, dass die Lesekompetenz der Jugendlichen in Deutschland signifikant höher ist als die durchschnittliche Lesekompetenz der Jugendlichen aller OECD-Staaten. Insgesamt befindet sich Deutschland im Vergleich mit den anderen OECD-Staaten im oberen Drittel der Rangreihenfolge. Die Gruppe der besonders leistungss...
Geschlechtsunterschiede in der Lesekompetenz zugunsten von Mädchen gehören zu den bemerkenswert soliden Befunden in Large-Scale-Erhebungen. Die vorliegende Studie analysiert Merkmale in Testantworten mittels automatischer natürlicher Sprachverarbeitung. Darüber hinaus stellt sie einen theoretischen Rahmen psychologischer Modelle zusammen, um zu ski...
The gender gap in reading literacy is a solid phenomenon, favoring girls, repeatedly found in large-scale assessments. The present study analyzed features in student responses by applying natural language processing techniques. A theoretical framework was compiled that allows to map features in the responses to the underlying cognitive components s...
In automatic coding of short text responses a computer categorizes responses. In the thesis, a free software has been developed capable of (i) grouping text responses into semantically homogeneous types, (ii) coding the types, and (iii) extracting features. Results showed fair to good up to excellent agreement between the software’s and humans’ cod...
Um Kurztextantworten in Erhebungen objektiv und konsistent auszuwerten, werden zumeist Kodierrichtlinien eingesetzt. An diesen orientieren sich menschliche Beurteiler, um eine Antwort einer Kategorie zuzuordnen (etwa richtig oder falsch). Kodierrichtlinien enthalten prototypische Antworten der jeweiligen Kodierung, die sogenannten Ankerbeispiele. D...
Um Kurztextantworten in Erhebungen objektiv und konsistent auszuwerten, werden zumeist Kodierrichtlinien eingesetzt. An diesen orientieren sich menschliche Beurteiler, um eine Antwort einer Kategorie zuzuordnen (etwa richtig oder falsch). Kodierrichtlinien enthalten prototypische Antworten der jeweiligen Kodierung, die sogenannten Ankerbeispiele. D...
We propose and empirically evaluate a theoretical framework of how to use coding guides for automatic coding (scoring) and how, in turn, automatic coding can enhance the use of coding guides. We adopted a recently described baseline approach to automatically classify responses. Well-established coding guides from PISA, comprising reference response...
Automatic coding of short text responses opens new doors in assessment. We implemented and integrated baseline methods of natural language processing and statistical modelling by means of software components that are available under open licenses. The accuracy of automatic text coding is demonstrated by using data collected in the Programme for Int...
The mission of German special schools is to enhance the education of students with Special Educational Needs in the area of Learning (SEN-L). However, recent studies indicate that graduate students with SEN-L from special schools show difficulties in basic arithmetical operations, and the development of basic mathematical skills during secondary sp...
Freie Textantworten sind bei der Erhebung psychologischer Konstrukte in Hinblick auf die Konstruktvalidität häug erstrebenswert, werden jedoch wegen des derzeit notwendigen Einsatzes menschlicher Kodierer zur Antwortauswertung nicht selten gescheut. Im vorgestellten Projekt werden Computertechnologien zur Verarbeitung natürlicher Sprache angewendet...
Prüfungen sind der entscheidende Steuerungsfaktor für das Lernen im Studium und damit ausschlaggebend für eine Kompetenzorientierung bei Studierenden. Bisher existieren kaum Befunde darüber, wie an deutschen Hochschulen tatsächlich geprüft wird und wie sich Studierende auf die bestehende Prüfungspraxis einstellen. Der vorliegende Beitrag differenzi...
This presentation introduces the newly developed test Synchronous Testing of Verbal, Spatial and Figural Memory (SynToM). Compared to existing retentivity instruments, the test’s gains range from thoroughly quantified item generation rules over free response format up to simultaneous load of working memory’s components. The test construction was le...