Steffi Pohl

Steffi Pohl
Freie Universität Berlin | FUB · Department of Education and Psychology

23.32
 · 
Dr. phil.

About

44
Publications
3,142
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
592
Citations
Introduction
Steffi Pohl currently works at the Department of Education and Psychology, Freie Universität Berlin. Steffi does research in Quantitative Psychology, Psychometrics and Causal Inference.
Research Experience
October 2013 - present
Freie Universität Berlin
Position
  • Professor (Assistant)
January 2009 - September 2013
Otto-Friedrich-Universität Bamberg
Position
  • Research Associate
Education
January 2005 - January 2010
Friedrich Schiller University Jena
Field of study
  • Modeling Method Effects

Publications

Publications (44)
Article
Full-text available
Approaches for dealing with item omission include incorrect scoring, ignoring missing values, and approaches for nonignorable missing values and have only been evaluated for certain forms of nonignorability. In this paper we investigate the performance of these approaches for various conditions of nonignorability, that is, when the missing response...
Article
Full-text available
So far, modeling approaches for not-reached items have considered one single underlying process. However, missing values at the end of a test can occur for a variety of reasons. On the one hand, examinees may not reach the end of a test due to time limits and lack of working speed. On the other hand, examinees may not attempt all items and quit res...
Article
Full-text available
In low‐stakes assessments, test performance has few or no consequences for examinees themselves, so that examinees may not be fully engaged when answering the items. Instead of engaging in solution behaviour, disengaged examinees might randomly guess or generate no response at all. When ignored, examinee disengagement poses a severe threat to the v...
Article
Full-text available
The often-used A(C)E model that decomposes phenotypic variance into parts due to additive genetic and environmental influences can be extended to a longitudinal model when the trait has been assessed at multiple occasions. This enables inference about the nature (e.g., genetic or environmental) of the covariance among the different measurement poin...
Article
For adequate modeling of missing responses, a thorough understanding of the nonresponse mechanisms is vital. As a large number of major testing programs are in the process or already have been moving to computer-based assessment, a rich body of additional data on examinee behavior becomes easily accessible. These additional data may contain valuabl...
Article
Missing values at the end of a test typically are the result of test takers running out of time and can as such be understood by studying test takers’ working speed. As testing moves to computer-based assessment, response times become available allowing to simulatenously model speed and ability. Integrating research on response time modeling with r...
Article
Covariate-adjusted treatment effects are commonly estimated in non-randomized studies. It has been shown that measurement error in covariates can bias treatment effect estimates when not appropriately accounted for. So far, these delineations primarily assumed a true data generating model that included just one single covariate. It is, however, mor...
Article
Mechanisms causing item nonresponses in large-scale assessments are often said to be nonignorable. Parameter estimates can be biased if nonignorable missing data mechanisms are not adequately modeled. In trend analyses, it is plausible for the missing data mechanism and the percentage of missing values to change over time. In this article, we inves...
Article
The average causal treatment effect (ATE) can be estimated from observational data based on covariate adjustment. Even if all confounding covariates are observed, they might not necessarily be reliably measured and may fail to obtain an unbiased ATE estimate. Instead of fallible covariates, the respective latent covariates can be used for covariate...
Article
Competence data from low-stakes educational large-scale assessment studies allow for evaluating relationships between competencies and other variables. The impact of item-level nonresponse has not been investigated with regard to statistics that determine the size of these relationships (e.g., correlations, regression coefficients). Classical appro...
Chapter
This chapter discusses adjustment when covariates are not perfectly reliable. It starts with reviewing a theoretical framework that applies to fallible and latent covariates. This framework allows for deriving conditions under which adjustment has to be based on the latent covariate and conditions under which adjustment has to be based on the falli...
Chapter
In order to precisely assess the cognitive achievement and abilities of students, different types of items are often used in competence tests. In the National Educational Panel Study (NEPS), test instruments also consist of items with different response formats, mainly simple multiple choice (MC) items in which one answer out of four is correct and...
Chapter
The National Educational Panel Study (NEPS) provides data on the development of competencies across the whole life span. Plausible values as a measure of individual competence are provided by explicitly including background variables that capture individual characteristics in the corresponding Item Response Theory model. Despite tremendous efforts...
Chapter
Including students with special educational needs in learning (SEN-L) is one of the National Educational Panel Study’s (NEPS) challenges. In this study, we address the question of whether the reading competence of students with SEN-L may be assessed reliably with the reading test designed for general-education students. In addition, we ask whether...
Article
Full-text available
Assessing competencies of students with special educational needs in learning (SEN-L) poses a challenge for large-scale assessments (LSAs). For students with SEN-L, the available competence tests may fail to yield test scores of high psychometric quality, which are—at the same time—measurement invariant to test scores of general education students....
Chapter
Including students with special educational needs in learning (SEN-L) is one of the National Educational Panel Study’s (NEPS) challenges. In this study, we address the question of whether the reading competence of students with SEN-L may be assessed reliably with the reading test designed for general-education students. In addition, we ask whether...
Article
When competence tests are administered, subjects frequently omit items. These missing responses pose a threat to correctly estimating the proficiency level. Newer model-based approaches aim to take nonignorable missing data processes into account by incorporating a latent missing propensity into the measurement model. Two assumptions are typically...
Chapter
Measuring domain-specific competencies of students with special educational needs (SEN) represents a challenge for the National Educational Panel Study (NEPS). In order to enable the assessment of students with SEN, the NEPS has set up feasibility studies in which the possibility of obtaining reliable test scores that are comparable to test scores...
Article
Full-text available
Including students with special educational needs in learning (SEN-L) is a challenge for largescale assessments. In order to draw inferences with respect to students with SEN-L and to compare their scores to students in general education, one needs to assure that the measurement model is reliable and that the same construct is measured for differen...
Chapter
The National Educational Panel Study (NEPS) aims at investigating the development of competencies across the whole lifespan. Competencies are assessed via tests and competence scores are estimated based on models of Item Response Theory (IRT). IRT allows a comparison of test scores—and, thus, the investigation of change across time and differences...
Article
In metacognition research, many studies focused on metacognitive knowledge of preschoolers or children at the end of elementary school or secondary school, but investigations of children starting elementary school are quite limited. The present study, thus, took a closer look at children’s knowledge about mental processes and strategies in early el...
Article
Full-text available
Data from competence tests usually show a number of missing responses on test items due to both omitted and not-reached items. Different approaches for dealing with missing responses exist, and there are no clear guidelines on which of those to use. While classical approaches rely on an ignorable missing data mechanism, the most recently developed...
Technical Report
Full-text available
The National Educational Panel Study (NEPS) provides data on the development of competencies across the whole life span. Plausible values as a measure of individual competence are provided by explicitly including background variables that capture individual characteristics in the corresponding Item Response Theory models. Despite tremendous efforts...
Article
This article introduces longitudinal multistage testing (lMST), a special form of multistage testing (MST), as a method for adaptive testing in longitudinal large-scale studies. In lMST designs, test forms of different difficulty levels are used, whereas the values on a pretest determine the routing to these test forms. Since lMST allows for testin...
Article
A procedure for examining essential unidimensionality in multicomponent measuring instruments is discussed. The method is based on an application of latent variable modeling and is concerned with the extent to which a common factor for all components of a given scale accounts for their correlations. The approach provides point and interval estimate...
Article
A method for examining common factor variance in multiple-component measuring instruments is outlined. The procedure is based on an application of the latent variable modeling methodology and is concerned with evaluating observed variance explained by a global factor and by one or more additional component-specific factors. The approach furnishes p...
Chapter
Campbell and Fiske (1959) proposed multitrait-multimethod (MTMM) designs for the validation of measurement instruments. In these designs each of several constructs (traits) are measured with the same set of methods. According to Campbell (1959),discriminant validity is supported if the trait under investigation can be distinguished from other trait...
Article
In evaluation studies it is often not possible to conduct randomized experiments for estimating causal effects of a program. Instead, quasi-experimental designs are frequently used. The causal interpretation of a statistically adjusted estimate is only warranted if the assumptions of the underlying quasi-experiment are met and the statistical analy...
Article
Full-text available
Method effects often occur when constructs are measured by different methods. In traditional multitrait-multimethod (MTMM) models method effects are regarded as residuals, which implies a mean method effect of zero and no correlation between trait and method effects. Furthermore, in some recent MTMM models, traits are modeled to be specific to a ce...
Article
Full-text available
Adjustment methods such as propensity scores and analysis of covariance are often used for estimating treatment effects in nonexperimental data. Shadish, Clark, and Steiner used a within-study comparison to test how well these adjustments work in practice. They randomly assigned participating students to a randomized or nonrandomized experiment. Tr...
Article
This study uses within-study comparisons to assess the relative importance of covariate choice, unreliability in the measurement of these covariates, and whether regression or various forms of propensity score analysis are used to analyze the outcome data. Two of the within-study comparisons are of the four-arm type, and many more are of the three-...
Article
Balanced scales, that is, scales based on items whose content is either negatively or positively polarized, are often used in the hope of measuring a bipolar construct. Research has shown that usually balanced scales do not yield 1-dimensional measurements. This threatens their construct validity. The authors show how to test bipolarity while accou...
Article
Method effects often occur when different methods are used for measuring the same construct. We present a new approach for modelling this kind of phenomenon, consisting of a definition of method effects and a first model, the "method effect model", that can be used for data analysis. This model may be applied to multitrait-multimethod data or to lo...