Steffi Pohl

Steffi Pohl
Freie Universität Berlin | FUB · Department of Education and Psychology

Dr. phil.

About

56
Publications
7,027
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
904
Citations
Introduction
Steffi Pohl currently works at the Department of Education and Psychology, Freie Universität Berlin. Steffi does research in Quantitative Psychology, Psychometrics and Causal Inference.
Additional affiliations
October 2013 - present
Freie Universität Berlin
Position
  • Professor (Assistant)
January 2009 - September 2013
Otto-Friedrich-Universität Bamberg
Position
  • Research Associate
Education
January 2005 - January 2010
Friedrich Schiller University Jena
Field of study
  • Modeling Method Effects

Publications

Publications (56)
Article
Full-text available
In this article the affiliation details for Author A were incorrectly given as ‘EDUCATIONAL MEASUREMENT’ but should have been ‘IPN–Leibniz Institute for Science and Mathematics Education’.
Article
Full-text available
Careless and insufficient effort responding (C/IER) can pose a major threat to data quality and, as such, to validity of inferences drawn from questionnaire data. A rich body of methods aiming at its detection has been developed. Most of these methods can detect only specific types of C/IER patterns. However, typically different types of C/IER patt...
Article
Full-text available
When measurement invariance does not hold, researchers aim for partial measurement invariance by identifying anchor items that are assumed to be measurement invariant. In this paper, we build on Bechger and Maris’s approach for identification of anchor items. Instead of identifying differential item functioning (DIF)-free items, they propose to ide...
Article
Full-text available
Interactive tasks designed to elicit real-life problem-solving behavior are rapidly becoming more widely used in educational assessment. Incorrect responses to such tasks can occur for a variety of different reasons such as low proficiency levels, low metacognitive strategies, or motivational issues. We demonstrate how behavioral patterns associate...
Article
Ability and test-taking behavior should be disentangled and jointly reported to improve interpretation and fairness
Article
The term speed-accuracy tradeoff is used when an increase in response speed comes at the expense of response accuracy. Although originally a concept from experimental psychology, the speed-accuracy tradeoff has been a topic in psychological assessment, too. In the first part of the manuscript, we discuss motivational factors that may be responsible...
Article
Full-text available
Complex interactive test items are becoming more widely used in assessments. Being computer-administered, assessments using interactive items allow logging time-stamped action sequences. These sequences pose a rich source of information that may facilitate investigating how examinees approach an item and arrive at their given response. There is a r...
Article
Identifying and considering test-taking effort is of utmost importance for drawing valid inferences on examinee competency in low-stakes tests. Different approaches exist for doing so. The speed-accuracy+engagement model aims at identifying non-effortful test-taking behavior in terms of nonresponse and rapid guessing based on responses and response...
Article
Full-text available
The Bracken School Readiness Assessment (BSRA) has been used in large studies such as the Millennium Cohort Study (MCS). Important conclusions might be done regarding its reliability for the prediction of children’s school readiness taking advantage of such large-scale evaluation. Although BSRA has being largely used, few are the studies at item-le...
Article
Full-text available
Measurement invariance (MI) cannot always be achieved for the full length of a test or questionnaire, leading researchers to seek for measurement invariant item subsets. Previous approaches either 1) can only analyze group covariates or 2) make implicit assumptions on the nature of MI and then yield a single item set. We provide a general approach...
Article
Full-text available
Approaches for dealing with item omission include incorrect scoring, ignoring missing values, and approaches for nonignorable missing values and have only been evaluated for certain forms of nonignorability. In this paper we investigate the performance of these approaches for various conditions of nonignorability, that is, when the missing response...
Article
Full-text available
So far, modeling approaches for not-reached items have considered one single underlying process. However, missing values at the end of a test can occur for a variety of reasons. On the one hand, examinees may not reach the end of a test due to time limits and lack of working speed. On the other hand, examinees may not attempt all items and quit res...
Article
Full-text available
In low‐stakes assessments, test performance has few or no consequences for examinees themselves, so that examinees may not be fully engaged when answering the items. Instead of engaging in solution behaviour, disengaged examinees might randomly guess or generate no response at all. When ignored, examinee disengagement poses a severe threat to the v...
Article
Full-text available
The often-used A(C)E model that decomposes phenotypic variance into parts due to additive genetic and environmental influences can be extended to a longitudinal model when the trait has been assessed at multiple occasions. This enables inference about the nature (e.g., genetic or environmental) of the covariance among the different measurement poin...
Article
Full-text available
For adequate modeling of missing responses, a thorough understanding of the nonresponse mechanisms is vital. As a large number of major testing programs are in the process or already have been moving to computer-based assessment, a rich body of additional data on examinee behavior becomes easily accessible. These additional data may contain valuabl...
Article
Missing values at the end of a test typically are the result of test takers running out of time and can as such be understood by studying test takers’ working speed. As testing moves to computer-based assessment, response times become available allowing to simulatenously model speed and ability. Integrating research on response time modeling with r...
Article
Covariate-adjusted treatment effects are commonly estimated in non-randomized studies. It has been shown that measurement error in covariates can bias treatment effect estimates when not appropriately accounted for. So far, these delineations primarily assumed a true data generating model that included just one single covariate. It is, however, mor...
Article
Mechanisms causing item nonresponses in large-scale assessments are often said to be nonignorable. Parameter estimates can be biased if nonignorable missing data mechanisms are not adequately modeled. In trend analyses, it is plausible for the missing data mechanism and the percentage of missing values to change over time. In this article, we inves...
Article
The average causal treatment effect (ATE) can be estimated from observational data based on covariate adjustment. Even if all confounding covariates are observed, they might not necessarily be reliably measured and may fail to obtain an unbiased ATE estimate. Instead of fallible covariates, the respective latent covariates can be used for covariate...
Article
Competence data from low-stakes educational large-scale assessment studies allow for evaluating relationships between competencies and other variables. The impact of item-level nonresponse has not been investigated with regard to statistics that determine the size of these relationships (e.g., correlations, regression coefficients). Classical appro...
Chapter
This chapter discusses adjustment when covariates are not perfectly reliable. It starts with reviewing a theoretical framework that applies to fallible and latent covariates. This framework allows for deriving conditions under which adjustment has to be based on the latent covariate and conditions under which adjustment has to be based on the falli...
Chapter
In order to precisely assess the cognitive achievement and abilities of students, different types of items are often used in competence tests. In the National Educational Panel Study (NEPS), test instruments also consist of items with different response formats, mainly simple multiple choice (MC) items in which one answer out of four is correct and...
Chapter
The National Educational Panel Study (NEPS) provides data on the development of competencies across the whole life span. Plausible values as a measure of individual competence are provided by explicitly including background variables that capture individual characteristics in the corresponding Item Response Theory model. Despite tremendous efforts...
Chapter
Including students with special educational needs in learning (SEN-L) is one of the National Educational Panel Study’s (NEPS) challenges. In this study, we address the question of whether the reading competence of students with SEN-L may be assessed reliably with the reading test designed for general-education students. In addition, we ask whether...
Article
Full-text available
Assessing competencies of students with special educational needs in learning (SEN-L) poses a challenge for large-scale assessments (LSAs). For students with SEN-L, the available competence tests may fail to yield test scores of high psychometric quality, which are—at the same time—measurement invariant to test scores of general education students....
Chapter
Including students with special educational needs in learning (SEN-L) is one of the National Educational Panel Study’s (NEPS) challenges. In this study, we address the question of whether the reading competence of students with SEN-L may be assessed reliably with the reading test designed for general-education students. In addition, we ask whether...
Article
When competence tests are administered, subjects frequently omit items. These missing responses pose a threat to correctly estimating the proficiency level. Newer model-based approaches aim to take nonignorable missing data processes into account by incorporating a latent missing propensity into the measurement model. Two assumptions are typically...
Chapter
Measuring domain-specific competencies of students with special educational needs (SEN) represents a challenge for the National Educational Panel Study (NEPS). In order to enable the assessment of students with SEN, the NEPS has set up feasibility studies in which the possibility of obtaining reliable test scores that are comparable to test scores...
Article
Full-text available
Including students with special educational needs in learning (SEN-L) is a challenge for largescale assessments. In order to draw inferences with respect to students with SEN-L and to compare their scores to students in general education, one needs to assure that the measurement model is reliable and that the same construct is measured for differen...
Chapter
The National Educational Panel Study (NEPS) aims at investigating the development of competencies across the whole lifespan. Competencies are assessed via tests and competence scores are estimated based on models of Item Response Theory (IRT). IRT allows a comparison of test scores—and, thus, the investigation of change across time and differences...
Article
Full-text available
Data from competence tests usually show a number of missing responses on test items due to both omitted and not-reached items. Different approaches for dealing with missing responses exist, and there are no clear guidelines on which of those to use. While classical approaches rely on an ignorable missing data mechanism, the most recently developed...
Article
In metacognition research, many studies focused on metacognitive knowledge of preschoolers or children at the end of elementary school or secondary school, but investigations of children starting elementary school are quite limited. The present study, thus, took a closer look at children’s knowledge about mental processes and strategies in early el...
Technical Report
Full-text available
The National Educational Panel Study (NEPS) provides data on the development of competencies across the whole life span. Plausible values as a measure of individual competence are provided by explicitly including background variables that capture individual characteristics in the corresponding Item Response Theory models. Despite tremendous efforts...
Article
This article introduces longitudinal multistage testing (lMST), a special form of multistage testing (MST), as a method for adaptive testing in longitudinal large-scale studies. In lMST designs, test forms of different difficulty levels are used, whereas the values on a pretest determine the routing to these test forms. Since lMST allows for testin...
Article
A procedure for examining essential unidimensionality in multicomponent measuring instruments is discussed. The method is based on an application of latent variable modeling and is concerned with the extent to which a common factor for all components of a given scale accounts for their correlations. The approach provides point and interval estimate...
Article
A method for examining common factor variance in multiple-component measuring instruments is outlined. The procedure is based on an application of the latent variable modeling methodology and is concerned with evaluating observed variance explained by a global factor and by one or more additional component-specific factors. The approach furnishes p...
Chapter
Campbell and Fiske (1959) proposed multitrait-multimethod (MTMM) designs for the validation of measurement instruments. In these designs each of several constructs (traits) are measured with the same set of methods. According to Campbell (1959),discriminant validity is supported if the trait under investigation can be distinguished from other trait...
Article
In evaluation studies it is often not possible to conduct randomized experiments for estimating causal effects of a program. Instead, quasi-experimental designs are frequently used. The causal interpretation of a statistically adjusted estimate is only warranted if the assumptions of the underlying quasi-experiment are met and the statistical analy...
Article
Full-text available
Method effects often occur when constructs are measured by different methods. In traditional multitrait-multimethod (MTMM) models method effects are regarded as residuals, which implies a mean method effect of zero and no correlation between trait and method effects. Furthermore, in some recent MTMM models, traits are modeled to be specific to a ce...
Article
Full-text available
Adjustment methods such as propensity scores and analysis of covariance are often used for estimating treatment effects in nonexperimental data. Shadish, Clark, and Steiner used a within-study comparison to test how well these adjustments work in practice. They randomly assigned participating students to a randomized or nonrandomized experiment. Tr...
Article
This study uses within-study comparisons to assess the relative importance of covariate choice, unreliability in the measurement of these covariates, and whether regression or various forms of propensity score analysis are used to analyze the outcome data. Two of the within-study comparisons are of the four-arm type, and many more are of the three-...
Article
Balanced scales, that is, scales based on items whose content is either negatively or positively polarized, are often used in the hope of measuring a bipolar construct. Research has shown that usually balanced scales do not yield 1-dimensional measurements. This threatens their construct validity. The authors show how to test bipolarity while accou...
Article
Method effects often occur when different methods are used for measuring the same construct. We present a new approach for modelling this kind of phenomenon, consisting of a definition of method effects and a first model, the "method effect model", that can be used for data analysis. This model may be applied to multitrait-multimethod data or to lo...

Network

Cited By