
Bruno D ZumboUniversity of British Columbia | UBC · Department of Educational and Counselling Psychology, and Special Education
Bruno D Zumbo
PhD
Canada Research Chair in Psychometrics & Measurement, Tier 1
About
406
Publications
281,832
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
21,456
Citations
Introduction
Professor Zumbo is a mathematical scientist whose research explores the properties and applications of measurement error models, test theory, survey design, test validation, and assessment.
He is currently Professor and Distinguished University Scholar, the Canada Research Chair in Psychometrics and Measurement (Tier 1), and the Paragon UBC Professor of Psychometrics & Measurement at the University of British Columbia.
bruno.zumbo@ubc.ca | @BD_Zumbo | https://ecps.educ.ubc.ca/bruno-d-zumbo/
Skills and Expertise
Additional affiliations
Publications
Publications (406)
Chalmers recently published a critique of the use of ordinal
α
proposed in Zumbo et al. as a measure of test reliability in certain research settings. In this response, we take up the task of refuting Chalmers' critique. We identify three broad misconceptions that characterize Chalmers' criticisms: (1) confusing assumptions with consequences of...
The purpose of this paper is to formally outline a sequence of propositions that describe the connections between five linearly additive measurement error models commonly used in disciplines from psychometrics and test theory to economics to epidemiology, and one new model formerly proposed in Kroc & Zumbo (2018). We show that although these models...
There is no consensus among assessment researchers about many of the central problems of response process data, including what is it and what is it comprised of. The Standards for Educational and Psychological Testing (American Educational Research Association et al., 2014) locate process data within their five sources of validity evidence. However...
ABSTRACT: In line with the journal volume’s theme, this essay considers lessons from the past and visions for the future of test validity. In the first part of the essay, a description of historical trends in test validity since the early 1900s leads to the natural question of whether the discipline has progressed in its definition and description...
The model of tests and measurements outlined in this article identifies test scores with Hilbert space vectors and true and error components of scores with linear operators. The collection of all observed scores associated with a test procedure is represented by the function space L2(W, A, P); the collection of all true scores is the Hilbert subspa...
Objective
Examine the feasibility and acceptability of a social identity-informed, online delivered, running and walking group program to support low-active post-secondary students’ exercise behavior and well-being during the COVID-19 pandemic.
Methods
A two-arm, non-blinded, parallel pilot randomized controlled trial was conducted whereby low-act...
Cognitive fusion occurs when people experience their thoughts as literally true and allow them to dictate behavior. Fusion has been shown to be associated with increased symptoms of post-traumatic stress disorder (PTSD) and depression; however, the association between change in cognitive fusion, PTSD, and depression symptoms has been relatively uni...
Producing Data This area involves survey design, measurement theory, survey sampling, probability sampling, randomizing comparative experiments (i.e., studies with randomization and comparison), quasi-experiments, and other such field studies. Analyzing Data Data analysis provides tools and strategies for extracting information from data, not only...
ABSTRACT: This monograph describes a framework for test validation that synthesizes construct theories and argument-based approaches bringing washback (also described as consequences or impact) into the foreground of the validation practices. This framework is well suited for tests immersed mainly in a construct theory and whose validation practice...
This monograph describes a framework for test validation that synthesizes construct theories and argument-based approaches bringing washback (also described as consequences or impact) into the foreground of the validation practices. This framework is well suited for tests immersed mainly in a construct theory and whose validation practices have all...
IMPORTANCE During the COVID-19 pandemic, health care workers (HCWs) reported a significant decline in their mental health. One potential health behavior intervention that has been shown to be effective for improving mental health is exercise, which may be facilitated by taking advantage of mobile application (app) technologies.
OBJECTIVE To determ...
A popular approach to the simulation of multivariate, non-normal data in the social sciences is to define a multivariate normal distribution first, and then alter its lower-dimensional marginals to achieve the shape of the distribution intended by the researchers. A consequence of this process is that the correlation structure is altered, so furthe...
Background:
A methodological review of 78 empirical articles focusing on the neurodevelopmental outcomes of at-risk infants was conducted.
Aims:
To examine ways language and terminology are used to describe methods, present results, and/or state conclusions in studies published during 1994-2005, a decade reflecting major advances in neurodevelop...
Ana dili olarak Türkçenin ölçülmesi ve değerlendirilmesi maddenin kendisi ve maddeye yönelik katılımcı tepkilerine odaklı bir yaklaşımla sürmektedir. 21. yüzyılda Türkçenin ölçülmesi ise MEB ve ÖSYM öncülüğündeki testler yoluyla gerçekleşmekte ve aynı yaklaşımla paydaşlar tarafından yürütülmektedir. Bununla beraber Dünya’da maddeye tepki sürecinin...
Objectives:
Multiple health behaviour change is a viable strategy to promote health outcomes. An example is the use of running behaviour to support smoking cessation in the group-mediated Run to Quit program. On the basis that changes in running and smoking identity were related to changes in running and smoking behaviour among individuals in the...
Objective
Psychological need satisfaction, from a self-determination theory (SDT) perspective, has been applied extensively to understand predictors of exercise behaviour. Dweck proposed a psychological needs framework that includes basic needs (optimal predictability, competence, acceptance), compound needs derived from combinations of basic needs...
Background
Previous research suggests that there is a bidirectional relationship between incidental affect (i.e., how people feel in day-to-day life) and physical activity behavior. However, many inconsistencies exist in the body of work due to the lag interval between affect and physical activity measurements.
Purpose
Using a novel continuous-tim...
What follows is a compilation of invited memories and tributes from just a few of Ron’s close colleagues and friends. Dr. April Zenisky, guest editor of this section of the International Test Commission magazine, "Testing International".
Zumbo, B.D. (2022). The World Mourns the Loss of a Giant in the Field of Psychometrics, But I Also Mourn the L...
Subjective evaluations of athletic performance are an important part of decision making across sporting organisations. Based on their expertise and intuition, coaches select their starting line-ups, scouts recommend or discourage teams from signing new potential players, and academy directors decide which players should move up or out of a team’s a...
Purpose
Mixture item response theory (MixIRT) models can be used to uncover heterogeneity in responses to items that comprise patient-reported outcome measures (PROMs). This is accomplished by identifying relatively homogenous latent subgroups in heterogeneous populations. Misspecification of the number of latent subgroups may affect model accuracy...
The central limit theorem (CLT) is one of the most important theorems in statistics, and it is often introduced to social sciences researchers in an introductory statistics course. However, the recent replication crisis in the social sciences prompts us to investigate just how common certain misconceptions of statistical concepts are. The main purp...
Maladaptive schemas have been linked with increased posttraumatic stress disorder (PTSD) symptoms. Posttraumatic negative self-appraisals (i.e., posttraumatic shame and self-blame) have also been empirically supported as contributors to PTSD symptom severity following traumatic events. These associations are well known; however, the pathways betwee...
Physical activity behaviour displays temporal variability, and is influenced by a range of dynamic psychological processes (e.g., affect) and shaped by various co-occurring events (e.g., social/environmental factors, interpersonal dynamics). Yet, most physical activity research tends not to examine the dynamic psychological processes implicated in...
This study examined differential item functioning (DIF) in the Center for Epidemiologic Studies Depression Scale (CES-D) between Chinese and White adolescents (aged 13 to 17 years) living in Canada. A series of ordinal logistic regressions were used to test for uniform and non-uniform DIF on items in the CES-D. The DIF analyses identified non-unifo...
A novel multimethod research methodology and accompanying statistical methods for operational and validity research is described in response to the emergence of remote online proctored test administration. The multimethod strategy was designed to allow for a robust comparison of the test centre and online test performance that far exceeds conventio...
This two-part study examined Dweck’s psychological needs model in relation to exercise-related well-being and particularly focused on the basic need for optimal predictability and compound needs for identity and meaning . In Part 1 ( N = 559), using exploratory factor analysis, scores derived from items assessing optimal predictability (prediction...
Veenhoven (Happiness in nations, subjective appreciation of life in 56 nations, 1946–1992 (Studies in social-cultural transformation, 2), Erasmus University Rotterdam, Risbo, 2003, Quality of life in the new millennium: ‘Advances in quality-of-life studies, theory and research’, Part 2: Refining concepts and measurement to assess cross-cultural qua...
Drawing on Kane’s argument-based approach to validity and Toulmin’s later work on cosmopolitanism and diversity, this paper asks whose validity arguments and evidence are being presented in International Large-Scale Assessments (ILSAs), where and when. With a case study of the OECD’s PISA for Development, we demonstrate that validity arguments are...
To ensure quality of education, a language framework should be the foundation on which second language curricula are developed. In 2010, the Council of Ministers of Education, Canada (CMEC), as suggested by Vandergrift (2006a, 2006b), recommended the use of the Common European Framework of Reference (CEFR) in the K-12 Canadian school context and pr...
The current technological age has created exponential growth in the availability of technology and data in every industry, including sport. It is tempting to get caught up in the excitement of purchasing and implementing technology, but technology has a potential dark side that warrants consideration. Before investing in technology, it is imperativ...
Background:
To examine the extent to which group-based exercise programs, informed by self-categorisation theory, result in improvements in psychological flourishing and reductions in age- and gender-related stigma consciousness among older adults.
Methods:
In the study, older adults (N = 485, ≥ 65 years) were randomised to similar age same gend...
Simulations concerning the distributional assumptions of coefficient alpha are contradictory. To provide a more principled theoretical framework, this article relies on the Fréchet–Hoeffding bounds, in order to showcase that the distribution of the items play a role on the estimation of correlations and covariances. More specifically, these bounds...
The concept of validity is one of the most influential concepts in psychometrics and survey research because considerations about its scope and nature influence every step in the survey life cycle, from survey and questionnaire design to implementation as well as the use and reporting of results. Curiously, quite independent lines of thought and pr...
This study introduces a novel differential item functioning (DIF) method based on propensity score matching that tackles two challenges in analyzing performance assessment data, that is, continuous task scores and lack of a reliable internal variable as a proxy for ability or aptitude. The proposed DIF method consists of two main stages. First, pro...
Methods to generate random correlation matrices have been proposed in the literature, but very few instances exist where these correlation matrices are structured or where the statistical properties of the algorithms are known. By relying on the tetrad relation discovered by Spearman and the properties of the beta distribution, an algorithm is prop...
This paper investigates measurement invariance as it relates to migration background using the Program for International Student Assessment measure of social belonging. We explore how the use of two measurement invariance techniques provide insights into differential item functioning using the alignment method in conjunction with logistic regressio...
Background:
The 3rd Regional Comparative and Explanatory Study reports, analyses and compares academic results in mathematics, sciences, and reading for 15 Latin American countries. Validity is the foundation of a testing procedure, and the process of validation is important to the overall success of educational assessment as a whole. This methodo...
Traditional notions of measurement error typically rely on a strong mean-zero assumption on the expectation of the errors conditional on an unobservable “true score” (classical measurement error) or on the data themselves (Berkson measurement error). Weakly calibrated measurements for an unobservable true quantity are defined based on a weaker mean...
For a reprint: https://digitalcommons.wayne.edu/jmasm/vol18/iss1/18/v
To evaluate the performance of propensity score approaches for differential item functioning analysis, this simulation study was conducted to assess bias, mean square error, Type I error, and power under different levels of effect size and a variety of model misspecification con...
Goal attainment scaling (GAS) is an internationally recognized measure that is widely used in educational, counseling, and clinical settings to identify and evaluate relevant goals for an individual. The GAS is an unusual measure because its content, which consists of goals, is formed by the respondent and/or users in the process of completing the...
Incorporating individual differences in causal attributions has been successful in self perception but there has been little attention to attributional styles in person perception. A key domain in person perception is attributions affecting helping behavior. Attributing a negative outcome to causes personally controllable by the victim elicits ange...
Within psychology and the social sciences, Ordinary Least Squares (OLS) regression is one of the most popular techniques for data analysis. In order to ensure the inferences from the use of this method are appropriate, several assumptions must be satisfied, including the one of constant error variance (i.e. homoskedasticity). Most of the training r...
Differential item functioning (DIF) and response shift (RS) can obscure the meaning of scores obtained from patient-reported outcome measures (PROMs). Although modern statistical methods are increasingly being developed to identify and accommodate for DIF and RS, there is limited awareness of these methods, and even of DIF and RS themselves, across...
The Vale and Maurelli algorithm is a widely used method that allows researchers to generate multivariate, nonnormal data with user-specified levels of skewness, excess kurtosis, and a correlation structure. Before obtaining the desired correlation structure, a transitional step requires the user to calculate the roots of a cubic polynomial referred...
The Fleishman third-order polynomial algorithm is one of the most-often used non-normal data-generating methods in Monte Carlo simulations. At the crux of the Fleishman method is the solution of a non-linear system of equations needed to obtain the constants to transform data from normality to non-normality. A rarely acknowledged fact in the litera...
Objectives
To systematically identify and qualitatively review the statistical approaches used in prospective cohort studies of team sports that reported intensive longitudinal data (ILD) (>20 observations per athlete) and examined the relationship between athletic workloads and injuries. Since longitudinal research can be improved by aligning the...
The investigation of differential item functioning (DIF) is important for any group comparison because the validity of the inferences made from scale scores could be compromised if DIF is present. DIF occurs when individuals from different groups show different probabilities of selecting a response option to an item after being matched on the under...
Single case design research on family centered positive behavior support (PBS) over the past 20 years has provided evidence of the approaches acceptability, effectiveness and durability when implemented with families of children with developmental disabilities and problem behavior. Although quality of life is a key tenet of PBS, only a few studies...
Scala del Senso di Comunità in Corsi Universitari Online
Le regole per lo scoring sono specificate nel file denominato...
Purpose:
Patient-reported outcome measures (PROMs) are frequently used in heterogeneous patient populations. PROM scores may lead to biased inferences when sources of heterogeneity (e.g., gender, ethnicity, and social factors) are ignored. Latent variable mixture models (LVMMs) can be used to examine measurement invariance (MI) when sources of het...
Creating a sense of community in online classes contributes to student retention and to their overall satisfaction with the course itself. This study aimed to develop a scale of sense of community of students attending online university courses. A series of ordinal exploratory factor analyses were conducted on data obtained from 839 students enroll...
The purpose of this research was to develop a questionnaire to assess the multidimensional construct of teamwork in sport and to examine various aspects of validity related to that instrument. A preliminary questionnaire was first created, and feedback on this instrument was then obtained from a sample of team-sport athletes (n = 30) and experts in...
Background:
Response shift (RS) has been defined as a change in the meaning of an individual's self-evaluation of his/her health status and quality of life. Several statistical model- and design-based methods have been developed to test for RS in longitudinal data. We reviewed the uptake of these methods in patient-reported outcomes (PRO) literatu...
In a recent paper Gromping provided a wide‐ranging review of metrics for assessing variable importance in regression analysis. There are, however, several flaws in Gromping's criticism of the well‐known metric attributed to Pratt. Among the metrics she reviewed, Pratt's metric stands out because it is the only one that provides both a theoretically...
The purpose of this research was to develop a questionnaire to assess the multidimensional construct of teamwork in sport and examine various aspects of validity related to that instrument. A preliminary questionnaire was first created, and feedback on this instrument was then obtained from a sample of team sport athletes (n = 30) and experts in sp...
This computer simulation study evaluates the robustness of the nonparametric Levene test of equal variances (Nordstokke & Zumbo, 2010) when sampling from populations with unequal (and unknown) means. Testing for population mean differences when population variances are unknown and possibly unequal is often referred to as the Behrens-Fisher problem...
When running a confirmatory factor analysis (CFA), users specify and interpret the pattern (loading) matrix. It has been recommended that the structure coefficients, indicating the factors' correlation with the observed indicators, should also be reported when the factors are correlated (Graham, Guthrie, & Thompson, 2003; Thompson, 1997). The aims...
A series of simulation studies are reported that investigated the impact of a skewed predictor(s) on the Type I error rate and power of the Wald test in a logistic regression model. Five simulations were conducted for three different regression models. A detailed description of the impact of skewed cell predictor probabilities and sample size provi...
Although anxiety is frequently reported in children with autism spectrum disorder (ASD), existing anxiety scales are often psychometrically inappropriate for this population. This study examined the internal structure, reliability, convergent and discriminant validity of the Spence Children’s Anxiety Scale-Parent Report (SCAS-P; Spence 1999) in 238...
Self-efficacy (SE) refers to one’s sense of personal competence and is a key element of human agency. Among individuals who are homeless or vulnerably housed, SE has the potential to provide important information about an individual’s ability to seek out and make use of resources and persevere in the face of multiple challenges. SE is understudied...
Measurement bias is a crucial concern for test fairness. Impact (true group difference in the measured scores) is of the ultimate interest in many scientific inquiries. This paper revisits and refines the definitions for bias and impact and articulates a conceptual framework that decouples them from differential item functioning. The conditions for...
Objectives:
Player unavailability negatively affects team performance in elite football. However, whether player unavailability and its concomitant performance decrement is mediated by any changes in teams' match physical outputs is unknown. We examined whether the number of players injured (i.e. unavailable for match selection) was associated wit...
When I think of pioneers I imagine a hardy people traveling from a former life to a new unsettled place full of unknowns.
This paper describes and explains citizen beliefs and attitudes about the quality of life in Jasper, Alberta in the summer of 1997. We report on 447 survey respondents’ satisfaction with a wide variety of aspects of their community and their lives, the best and worst things about living in Jasper, and the things they would change first to improve t...
This paper describes some of the expectations and attitudes of British Columbians toward possible events in the first one hundred years of the third millennium, and explains their happiness and satisfaction with the quality of their lives. We report on results of two independent surveys taken in October and November 1999, one containing 499 respond...
The aim of this investigation is to obtain some baseline self-reported data on the health status and overall quality of life of all residents of the Bella Coola Valley of British Columbia aged 17 years or older, and to measure the impact of a set of designated health determinants on their health and quality of life. In the period from August to Nov...
The direct monitoring of key attitudes, expectations, feelings, aspirations, and values in a population is necessary for an understanding of social change and the quality of life.
PurposeThe purpose of this study was to examine whether homeless or vulnerably housed individuals experienced response shift over a 12-month time period in their self-reported physical and mental health status. Methods
Data were obtained from the Health and Housing in Transition study, a longitudinal multi-site cohort study in Canada (N = 1190 at b...
As a convenient data source from computerized tests, response time could also be very informative evidence for the validity of test scores, offering an opportunity for insights into parts of the test that test takers linger over and other parts of the test where they glide through the material. Given these hopes for, and expectations of, response t...
In this first chapter, we set the stage for subsequent chapters that we believe will push the boundaries of our current thinking about response processes as validity evidence. Evidence based on response processes has been an overlooked source of validity evidence, but one that offers much promise and strength to support the inferences we make from...
Observations of real-life testing situations can provide important insights into test validation and assessment response processes. We consider observations of face-to-face interaction as a starting point to investigate how variation in assessment performance is informed by the ecological characteristics of the testing situation. This perspective o...
This chapter focuses on the examination of response shift in patient-reported outcomes (PROs) research, with particular attention to measurement validity and response processes. Response shift occurs when changes in PROs over time are the result of changes in how people interpret and respond to PRO measurement items at different points in time. Con...
This chapter considers how the process-based variables of test-taking strategies as reported by test-takers can help to explain the differences in the outcome of a reading comprehension test and serve to provide process level evidence of validity. With the process variables as the explanatory variables, test-takers’ performance was analyzed via a l...