
Maria Elena OliveriUniversity of Nebraska at Lincoln | NU · Department of Educational Psychology
Maria Elena Oliveri
PhD Measurement Evaluation & Research Methodology
About
73
Publications
26,612
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
733
Citations
Citations since 2017
Introduction
Maria Elena Oliveri has a PhD and Masters in Measurement, Evaluation, and Research Methodology and a Masters in Clinical Counselling Psychology from the University of British Columbia. Her research focuses on (digital, competency-based, personalized, adaptive) assessments in K-12, higher education, and career and technical education, validity, fairness, and English language learners.
Additional affiliations
January 2017 - June 2019
International Journal of Testing
Position
- Editor
June 2015 - present
ETS
Position
- Researcher
February 2012 - present
Publications
Publications (73)
In this document, we present a framework that outlines the key considerations relevant to the fair
development and use of exported assessments. Exported assessments are developed in one country and are
used in countries with a population that differs from the one for which the assessment was developed.
Examples of these assessments include the Grad...
The literature and the employee and workforce surveys rank collaborative problem solving (CPS) among the top 5 most critical skills necessary for success in college and the workforce.This paper provides a review of the literature on CPS and related terms, including a discussion of their definitions, importance to higher education and workforce succ...
This report explores the ways in which human resource (HR) managers use TOEIC® scores to inform hiring, promotion, and training decisions in an international workplace. Two data sources were used (a) previously collected test users' testimonials that described managers' use of TOEIC scores to inform HR decisions and (b) test-use examples collected...
Collaborative problem solving (CPS) ranks among the top five most critical skills necessary for college graduates to meet workforce demands (Hart Research Associates, 2015). It is also deemed a critical skill for educational success (Beaver, 2013). It thus deserves more prominence in the suite of courses and subjects assessed in K-16. Such inclusio...
For native and nonnative English speakers, employment increasingly requires proficiency in communication, given its critical role in employees’ ability to successfully carry out work-related activities. Although communicating competently is important for employability, survey findings have suggested that employers believe that colleges are not teac...
Corpora and Rhetorically Informed Text Analysis explores applications of rhetorically informed approaches to corpus research. Bringing together contributions from scholars in a variety of fields, it takes up questions of how theories and traditions in rhetorical analysis can be integrated with corpus techniques in order to enrich our understanding...
Fairness is a concept that has both technical and social meaning. In the specific case of quantitative measurement in education, it is important that both restricted and broad perspectives of fairness are acknowledged—and resonances between them identified—so that current and future research can be understood in terms of evidence-based principles....
In this article, we propose a justice-oriented, antiracist validity framework designed to disrupt assessment practices that continue to (re)produce racism through the uncritical promotion of white supremist hegemonic practices. Using anti-Blackness as illustration, we highlight the ways in which racism is introduced, or ignored, in current assessme...
This article introduces the first special issue of The Journal of Writing Analytics. Documenting a program of research in nine papers and an afterword, colleagues deliberate on a single, pervasive theme: “Meeting the Challenges of Workplace English Communication in the 21st Century.”
Background: This study advances a sociocognitive approach to modeling complex communication tasks. Using an integrative perspective of linguistic, cultural, and substantive (LCS) patterns, we provide a framework for understanding the nature and acquisition of people's adaptive capabilities in social/cognitive complex adaptive systems. We also illus...
Structured Abstract • Background: In today's rapidly evolving world, technological pressures coupled with changes in the nature of work increasingly require individuals to use advanced technologies to communicate and collaborate in geographically distributed multidisciplinary teams. These shifts present the need to teach and assess an expanded set...
Structured Abstract • Background: An expanded skillset is needed to meet today's shifting workplace demands, which involve collaboration with geographically distributed multidisciplinary teams. As the nature of work changes due to increases in automation and the elevated need to work in multidisciplinary teams, enhanced visions of Workplace English...
In this afterword, we look across this special issue to draw out the lessons learned from researchers and scholars involved in designing, using, and interpreting evidence from assessments of complex WEC tasks.
This report discusses frameworks and assessment development approaches to consider fairness, opportunity to learn, and conse-quences of test use in the design and use of assessments administered to diverse populations. Examples include the integrated designand appraisal framework and the sociocognitively based evidence-centered design approach. The...
Higher Education Admissions Practices - edited by María Elena Oliveri January 2020
Higher Education Admissions Practices - edited by María Elena Oliveri January 2020
In this digital ITEMS module, Dr. Robert [Bob] Mislevy and Dr. Maria Elena Oliveri introduce and illustrate a sociocognitive perspective on educational measurement, which focuses on a variety of design and implementation considerations for creating fair and valid assessments for learners from diverse populations with diverse sociocultural experienc...
The module's purpose is to introduce and illustrate a sociocognitive perspective on educational measurement, which focuses on a variety of design and implementation considerations for creating fair and valid assessments for learners from diverse populations with diverse sociocultural experiences. The first part of the module, narrated by Dr. Mislev...
One of the most critical steps in the test development process is defining the construct, or the knowledge, skills, or abilities, to be assessed. This foundational step provides the basis for initial assumptions about the meaning of test scores and serves as a reference for subsequent validity research. In this paper, we describe the purpose of the...
As the diversity of the test-taker population increases so should assessment
development practices evolve to consider the various needs of the multiple populations
taking the assessments. One need is the ability to understand the language used in test
items and tasks so they do not present unnecessary challenges for the test-takers, which
may be mi...
This chapter focuses on expanding the skill sets and constructs measured as part of college admissions and approaches to developing assessments that are sensitive to sociocognitive and sociocultural differences of the populations taking them. It describes efforts to reduce sources of construct-irrelevant variance in assessments administered to stud...
One consequence of globalization is the growing need for a common language across international businesses. Increasingly, English proficiency is becoming a requirement for successfully carrying out work activities in such establishments, leading to an elevated global use of English. As a result, growing numbers of prospective employees require Engl...
Scores from noncognitive measures are increasingly valued for their utility in helping to inform postsecondary admissions decisions. However, their use has presented challenges because of faking, response biases, or subjectivity, which standardized third‐party evaluations (TPEs) can help minimize. Analysts and researchers using TPEs, however, need...
In this paper, we first examine the challenges of score comparability associated with the use of assessments that are exported. By exported assessments, we mean assessments that are developed for domestic use and are then administered in other countries in either the same or a different language. Second, we provide suggestions to better support the...
In this paper, we first examine the challenges to score comparability associated with the use of assessments that are exported. By exported assessments, we mean assessments that are developed for domestic use and are then administered in other countries in either the same or a different language. Second, we provide suggestions to better support the...
Fifty years after the first international large-scale assessment (ILSA), participation in these studies continues to grow, with more than 50% of the world’s countries participating. Concomitant with growth in ILSAs is an expansion in the diversity of participant countries with respect to languages, cultures, and educational perspectives and goals....
The assessment of noncognitive traits is challenging due to possible response biases, “subjectivity” and “faking.” Standardized third-party evaluations where an external evaluator rates an applicant on their strengths and weaknesses on various noncognitive traits are a promising alternative. However, accurate score-based inferences from third-party...
In this study, we examined differential item functioning (DIF) of the Deep Approaches to Learning scale on the National Survey of Student Engagement (NSSE) for Asian international and Canadian domestic 1st-year university students. We also examined its potential sources of using focus-group interview results. Only 1 of the 12 items functioned diffe...
We analyzed a pool of items from an admissions test for differential item functioning (DIF) for groups based on age, socioeconomic status, citizenship, or English language status using Mantel-Haenszel (Holland & Thayer, 1988) and item response theory (Hambleton, Swaminathan, & Rogers, 1991). DIF items were systematically examined to identify its po...
Fifty years after the first international large-scale assessment (ILSA), participation in these studies continues to grow, with more than 50% of the world’s countries taking part (Kamens & McNeely, 2010). Concomitant with growth in ILSAs is an expansion in the diversity of participating countries, which introduces increased diversity in the represe...
From 2006 to 2008, Educational Testing Service (ETS) produced a series of reports titled A Culture of Evidence, designed to capture a changing climate in higher education assessment. A decade later, colleges and universities face a new set of challenges resulting from societal, technological, and other influences, leading to a need to augment the s...
These guidelines outline considerations relevant to the assessment of test takers in countries or regions that may be linguistically diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the fair and valid assessment...
Assessing complex constructs such as those discussed under the umbrella of 21st century constructs highlights the need for a principled assessment design and validation approach. In our discussion, we made a case for three considerations: (a) taking construct complexity into account across various stages of assessment development such as the design...
Automated essay scoring is a developing technology that can provide efficient scoring of large numbers of written responses. Its use in higher education admissions testing provides an opportunity to collect validity and fairness evidence to support current uses and inform its emergence in other areas such as K–12 large-scale assessment. In this stu...
Differential item functioning (DIF) analyses have been used as the primary method in large-scale assessments to examine fairness for subgroups. Currently, DIF analyses are conducted utilizing manifest methods using observed characteristics (gender and race/ethnicity) for grouping examinees. Homogeneity of item responses is assumed denoting that all...
In this study, we propose that the unique needs and characteristics of linguistic minorities
should be considered throughout the test development process. Unlike most
measurement invariance investigations in the assessment of linguistic minorities,
which typically are conducted after test administration, we propose strategies that
focus on the earl...
The assessment of linguistic minorities often involves using multiple language versions of assessments. In these assessments, comparability of scores across language groups is central to valid comparative interpretations. Various frameworks and guidelines describe factors that need to be considered when developing comparable assessments. These fram...
The computer-based Graduate Record Examinations® (GRE®) revised General Test includes interactive item types and testing environment tools (e.g., test navigation, on-screen calculator, and help). How well do test takers understand these innovations? If test takers do not understand the new item types, these innovations may introduce construct-irrel...
Our objectives in this study were twofold. First, we examined whether there were items functioning differentially across multiple subgroups taking the GRE® revised General Test (rGRE) based on: (a) age, (b) socioeconomic status, (c) gender, (d) citizenship, (e) English best language, and (f) intended major of study (STEM versus non-STEM). This ques...
Heterogeneity within English language learners (ELLs) groups has been documented. Previous research on differential item functioning (DIF) analyses suggests that accurate DIF detection rates are reduced greatly when groups are heterogeneous. In this simulation study, we investigated the effects of heterogeneity within linguistic (ELL) groups on the...
Heterogeneity within English language learners (ELLs) groups has been documented. Previous research on differential item functioning (DIF) analyses suggests that accurate DIF detection rates are reduced greatly when groups are heterogeneous. In this simulation study, we investigated the effects of heterogeneity within linguistic (ELL) groups on the...
In this study, we contrast results from two differential item functioning (DIF) approaches (manifest and latent class) by the number of items and sources of items identified as DIF using data from an international reading assessment. The latter approach yielded three latent classes, presenting evidence of heterogeneity in examinee response patterns...
In this article, we review translation, adaptation policies, and practices in providing test accommodation for English language learners (ELLs) in the United States. We collected documents and conducted interviews with officials in the 12 states that provide translation accommodations to ELLs on content assessments. We then summarized challenges to...
In this article, we investigate the creation of comparable score scales across countries in international assessments. We examine potential improvements to current score scale calibration procedures used in international large-scale assessments. Our approach seeks to improve fairness in scoring international large-scale assessments, which often ign...
Abstract: In this paper, we discuss the importance of including noncognitive measures in admission practices at both the
graduate and undergraduate levels. The use of these measures can be helpful in reducing the achievement gap that is: a) consistently documented in the literature and b) strongly associated with admissions of ethnic minorities in...
In this study, we investigated differential item functioning (DIF) and its sources using a latent class (LC) modeling approach. Potential sources of LC DIF related to instruction and teacher-related variables were investigated using substantive and three statistical approaches: descriptive discriminant function, multinomial logistic regression, and...
International large-scale assessments of achievement often have a large degree of differential item functioning (DIF) between countries, which can threaten score equivalence and reduce the validity of inferences based on comparisons of group performances. It is important to understand potential sources of DIF to improve the validity of future asses...
Even if national and international assessments are designed to be comparable, subsequent psychometric analyses often reveal differential item functioning (DIF). Central to achieving comparability is to examine the presence of DIF, and if DIF is found, to investigate its sources to ensure differentially functioning items that do not lead to bias. In...
In this study, the Canadian English and French versions of the Problem-Solving Measure of the Programme for International Student Assessment 2003 were examined to investigate their degree of measurement comparability at the item- and test-levels. Three methods of differential item functioning (DIF) were compared: parametric and nonparametric item r...
In this study, we examine the degree of construct comparability and possible sources of incomparability of the English and French versions of the Programme for International Student Assessment (PISA) 2003 problem-solving measure administered in Canada. Several approaches were used to examine construct comparability at the test- (examination of test...
This study used item response data from 30 countries who participated in the Programme for International Student Assessment (PISA). It compared reduction of proportion of item misfit associated with alternative item response theory (IRT; multidimensional and multi-parameter Rasch and 2 parameter logistic; 2PL) models and linking (mean-mean IRT vs....
The purpose of this literature review commissioned by the Canadian Educati on Stati sti cs Council
(CESC) is to summarize the research evidence on key factors and practi ces supporti ng literacy success
for school-aged students. The review will focus exclusively on reading, or the ability to get meaning
from print, because it is fundamental to the...
The views and opinions expressed in this document are those of the authors and do not necessarily represent the views of UNESCO or IOS who has contracted the study and assured adherence to evaluation standards. The designations employed and the presentation of material throughout this document do not imply the expression of any opinion whatsoever o...