Walter Leite

Walter Leite
  • Ph.D.
  • Professor (Full) at University of Florida

About

131
Publications
43,790
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,392
Citations
Introduction
My current research program explores how data mining and machine learning methods may assist in statistical modeling for theory development and causal inference. I focus on data from virtual learning environments, state-level data systems and large national surveys. I specialize in using structural equation modeling, multilevel modeling, propensity score methods, data mining and machine learning methods to analyze large scale data.
Current institution
University of Florida
Current position
  • Professor (Full)
Additional affiliations
August 2017 - December 2017
University of Florida
Position
  • Professor (Full)

Publications

Publications (131)
Article
This study investigated the effect of testlets on regularization-based differential item functioning (DIF) detection in polytomous items, focusing on the generalized partial credit model with lasso penalization (GPCMlasso) DIF method. Five factors were manipulated: sample size, magnitude of testlet effect, magnitude of DIF, number of DIF items, and...
Preprint
Full-text available
Machine learning has become a common approach for estimating propensity scores for quasi-experimental research using matching, weighting, or stratification on the propensity score. This systematic review examined machine learning applications for propensity score estimation across different fields, such as health, education, social sciences, and bu...
Article
Full-text available
Mediation analyses in randomized controlled trials (RCTs) can unpack potential causal pathways between interventions and outcomes. Studies employing mediation analyses serve as the groundwork for advancing the theories of action and strengthening the scientific base for interventions. When designing RCTs investigating these mechanisms, investigator...
Article
Full-text available
In the realm of propensity score analysis (PSA), the challenge of missing data significantly hampers the accuracy of propensity score estimation and the evaluation of covariate balance. While existing literature offers guidance on multiple imputation (MI) methods for addressing missing data in psychological research, there is a notable lack of deta...
Article
Social desirability bias (SDB) is a common threat to the validity of conclusions from responses to a scale or survey. There is a wide range of person-fit statistics in the literature that can be employed to detect SDB. In addition, machine learning classifiers, such as logistic regression and random forest, have the potential to distinguish between...
Preprint
Full-text available
(Preprint DOI - https://osf.io/preprints/psyarxiv/a47w6/) Propensity Score Analysis (PSA) is a prominent method to alleviate selection bias in observational studies, but missing data in covariates is prevalent and must be dealt with during propensity score estimation. Through Monte Carlo simulations, this study evaluates the use of imputation meth...
Article
In this Monte Carlo simulation study, the performance of six different propensity score methods implemented through weighting cases was investigated: inverse probability of treatment weighting, truncated inverse probability of treatment weighting, propensity score stratification, marginal mean weighting through propensity score stratification, opti...
Article
Propensity score analyses (PSA) of continuous treatments often operationalize the treatment as a multi-indicator composite, and its composite reliability is unreported. Latent variables or factor scores accounting for this unreliability are seldom used as alternatives to composites. This study examines the effects of the unreliability of indicators...
Chapter
Automated essay scoring and short-answer scoring have shown tremendous potential for enhancing and promoting large-scale assessments. Challenges still remain in the equity and the implicit bias in scoring that is ingrained in the scoring system. One promising solution to mitigate the problem is the introduction of a measurement model that quantifie...
Article
The global COVID-19 health pandemic caused major interruptions to educational assessment systems, partially due to shifts to remote learning environments, entering the post-COVID educational world into one that is more open to heterogeneity in instructional and assessment modes for secondary students. In addition, in 2020, educational inequities we...
Article
Full-text available
Academic discourse communities and learning circles are characterized by collaboration, sharing commonalities in terms of social interactions and language. The discourse of these communities is composed of jargon, common terminologies, and similarities in how they construe and communicate meaning. This study examines the extent to which discourse r...
Article
In the current study, we compare propensity score (PS) matching methods for data with a cross-classified structure, where each individual is clustered within more than one group, but the groups are not hierarchically organized. Through a Monte Carlo simulation study, we compared sequential cluster matching (SCM), preferential within cluster matchin...
Article
Social desirability bias (SDB) has been a major concern in educational and psychological assessments when measuring latent variables because it has the potential to introduce measurement error and bias in assessments. Person-fit indices can detect bias in the form of misfitted response vectors. The objective of this study was to compare the perform...
Article
As instruction shifts away from traditional approaches, online learning has grown in popularity in K-12 and higher education. Artificial intelligence (AI) and learning analytics methods such as machine learning have been used by educational scholars to support online learners on a large scale. However, the fairness of AI prediction in educational c...
Article
Full-text available
After nationwide school closures due to COVID-19, virtual learning environments (VLE) have seen tremendous increase in usage. The current study identified teacher activities for orchestration using an Algebra VLE during school closures, and whether these activities were related to student achievement. In May 2020, we collected survey data on how 21...
Article
The current study examines both student self-regulated learning (SRL) and teacher orchestration in a virtual learning environment (VLE), with respect to student achievement. The study used SRL indicators derived from the log data on how students used the VLE system, survey data on how teachers made use of the VLE for Algebra instruction, as well as...
Article
Full-text available
A discussion forum is a valuable tool to support student learning in online contexts. However, interactions in online discussion forums are sparse, leading to other issues such as low engagement and dropping out. Recent educational studies have examined the affordances of conversational agents (CA) powered by artificial intelligence (AI) to automat...
Article
According to the Standards for Educational and Psychological Testing (2014), one aspect of test fairness concerns examinees having comparable opportunities to learn prior to taking tests. Meanwhile, many researchers are developing platforms enhanced by artificial intelligence (AI) that can personalize curriculum to individual student needs. This le...
Article
Model specification is a crucial aspect of structural equation modeling (SEM), since a misspecified model may lead to biased parameter estimation and result in inaccurate conclusions. We propose the Hybrid Ant Colony Optimization Algorithm (hACO), an improved metaheuristic algorithm to conduct model specification searches in SEM. This data mining a...
Preprint
In the current study, we compare propensity score (PS) matching methods for data with a cross-classified structure, where each individual is clustered within more than one group, but the groups are not hierarchically organized. Through a Monte Carlo simulation study, we compared sequential cluster matching, preferential within cluster matching, gre...
Article
The unstructured multiple-attempt (MA) item response data in virtual learning environments (VLEs) are often from student-selected assessment data sets, which include missing data, single-attempt responses, multiple-attempt responses, and unknown growth ability across attempts, leading to a complex and complicated scenario for using this kind of dat...
Article
This latent class analysis study used a bias-adjusted three-step approach to empirically identify mutually exclusive clusters of teacher professional qualifications based on commonly studied indicators of teacher quality. We then examined the relationship between cluster membership and the mathematics gains of adolescents at risk for mathematics di...
Article
Help-seeking is a valuable practice in online discussion forums. However, the asynchronicity and information overload of online discussion forums have made it challenging for help seekers and providers to connect effectively. This study formulated a new method to provide fair and accurate insights toward building a peer recommender to support help-...
Article
Full-text available
Sensitivity analyses encompass a broad set of post-analytic techniques that are characterized as measuring the potential impact of any factor that has an effect on some output variables of a model. This research focuses on the utility of the simulated annealing algorithm to automatically identify path configurations and parameter values of omitted...
Article
Full-text available
O objetivo deste estudo é analisar o efeito indireto do nível socioeconômico sobre a proficiência em matemática, considerando como variável mediadora o tipo de escola (pública ou particular). A base de dados foi a pesquisa Geres. Utilizamos Modelagem de Equações Estruturais, juntamente com o método cluster-robust standard errors. Usamos as variávei...
Article
The piecewise latent growth models (PWLGMs) can be used to study changes in the growth trajectory of an outcome due to an event or condition, such as exposure to an intervention. When there are multiple outcomes of interest, a researcher may choose to fit a series of PWLGMs or a single parallel-process PWLGM. A comparison of these models is provide...
Article
Neural networks are a contending data mining procedure to estimate propensity scores due to its robustness to non-normal residual distributions, ability to detect complex nonlinear relationships between treatments and confounding variables, nonessential model specification, and compatibility to train based on observed events. In this study, we deve...
Conference Paper
Full-text available
Careless responding and keeping students motivated for different tests have been common problems in many areas, especially in education. This study's objective was to demonstrate a novel approach to detect careless responding using person-fit indices developed within the field of psychometrics combined with a random forest. The data used was obtain...
Article
Teaching Strategies GOLD® child assessment system has been frequently adopted in state-funded early childhood policy initiatives, but there is little validation research about its newest edition, GOLD® Birth through Third Grade (GOLD® B-3rd). Based on a sample of children aged from birth through pre-kindergarten, this study investigated validity ev...
Article
Background Propensity score analysis (PSA) is a popular method to remove selection bias due to covariates in quasi-experimental designs, but it requires handling of missing data on covariates before propensity scores are estimated. Multiple imputation (MI) and single imputation (SI) are approaches to handle missing data in PSA. Objectives The obje...
Chapter
There has been a long-standing issue of sparse discussion forums participation in online learning, which can impede students’ help seeking practices. Researchers have examined AI techniques such as link prediction with network analysis to connect help seekers with help providers. However, little is known whether these AI systems will treat students...
Article
With the growing use of virtual learning environments (VLE), innovative methods to evaluate their performance are increasingly needed. A key difficulty in evaluating VLE using system logs is the large heterogeneity of usage patterns. The current study demonstrates an approach to classify complex patterns of student-level and classroom-level usage w...
Article
Full-text available
Objective This study aims to: (1) examine gender differences for weight conscious drinking among college students accounting for the broader phenomenon (e.g. including the Alcohol Effects dimension); and (2) longitudinally examine the effect of weight conscious drinking behaviors on body mass index (BMI). Participants: United States freshmen studen...
Article
Full-text available
In data collected from virtual learning environments (VLEs), item response theory (IRT) models can be used to guide the ongoing measurement of student ability. However, such applications of IRT rely on unbiased item parameter estimates associated with test items in the VLE. Without formal piloting of the items, one can expect a large amount of noni...
Article
Adoption of online resources to support instruction and student performance has amplified with technological advances and increased standards for mathematics education. Because teachers play a critical role in the adoption of technology, analysis of data pertaining to how and why teachers utilize online resources is needed to optimize the design an...
Article
Background: The generalized propensity score (GPS) addresses selection bias due to observed confounding variables and provides a means to demonstrate causality of continuous treatment doses with propensity score analyses. Estimating the GPS with parametric models obliges researchers to meet improbable conditions such as correct model specification...
Article
Full-text available
Studies using structural equation modeling (SEM) to evaluate theories against observed data rely on multiple sources of evidence to support a proposed model, such as fit indices, variance explained, and comparison of alternative models. Additional evidence can be obtained by evaluating the model results’ sensitivity to an omitted confounder. The ph...
Article
Artificial neural networks (NN) can help researchers estimate propensity scores for quasi-experimental estimation of treatment effects because they can automatically detect complex interactions involving many covariates. However, NN is difficult to implement due to the complexity of choosing an algorithm for various treatment levels and monitoring...
Conference Paper
Full-text available
In data collected from virtual learning environments (VLEs), item response theory (IRT) models can be used to guide the ongoing measurement of student ability. However, such applications of IRT rely on unbiased item parameter estimates associated with test items in the VLE. Without formal piloting of the items, one can expect a large amount of non-...
Article
Online learning platforms integrating open educational resources (OERs) are increasingly adopted in secondary education as supplemental resources for teaching and learning. However, students report difficulties sustaining their engagement because of the self-paced nature of OER-supported learning environments. We noted that little attention has bee...
Article
This study compares automated methods to develop short forms of psychometric scales. Obtaining a short form that has both adequate internal structure and strong validity with respect to relationships with other variables is difficult with traditional methods of short-form development. Metaheuristic algorithms can select items for short forms while...
Article
Virtual learning environments (VLEs) are increasingly used at-scale in educational contexts to facilitate teaching and promote learning, and the data they produce can be used for educational research purposes. Meanwhile, the U.S. Department of Education’s Office of Educational Technology has repeatedly emphasized the importance of using evidence to...
Article
Full-text available
Objectives Parents’ early school involvement is central to successful school transition. However, results of parenting programs aimed at improving kindergarten transition for children from disadvantaged backgrounds are inconclusive and the achievement gap is increasing. Using a family resilience model, we examine relationships between a set of pare...
Article
Full-text available
Objective: To examine the effect of weight-conscious drinking and compensatory behavior temporality on binge drinking frequency of college freshmen. Participants: Freshmen (n = 1149) from eight US universities, Fall 2015. Methods: Participants completed the Compensatory Eating Behaviors in Response to Alcohol Consumption Scale and Alcohol Use Diso...
Article
Full-text available
Although the use of technology in the K12 classroom has been shown to have a positive impact, research on the use of open education resources (OER) is relatively limited, especially research focusing on low‐achieving students. The present study examines the relationship between usage of Algebra Nation, a self‐guided system that provided instruction...
Article
Propensity score (PS) analysis aims to reduce bias in treatment effect estimates obtained from observational studies, which may occur due to non-random differences between treated and untreated groups with respect to covariates related to the outcome. We demonstrate how to use structural equation modeling (SEM) for PS analysis to remove selection b...
Article
Data sets from large-scale longitudinal surveys involving young children and families have become available for secondary analysis by researchers in a variety of fields. Researchers in early intervention have conducted secondary analyses of such data sets to explore relationships between nonmalleable and malleable factors and child outcomes, and to...
Article
In this study, we evaluated the estimation of three important parameters for data collected in a multisite cluster-randomized trial (MS-CRT): the treatment effect, and the treatment by covariate interactions at Levels 1 and 2. The Level 1 and Level 2 interaction parameters are the coefficients for the products of the treatment indicator, with the c...
Article
Novice special education teachers (SETs) consistently report feeling overwhelmed by their workloads, and their perceptions of their workloads predict outcomes of concern, such as burnout and plans to quit teaching. Yet, to date, research provides few insights into feasible strategies school leaders could use to help novices better manage workloads....
Book
Full-text available
This book is a printed and enhanced version of an interactive platform that is available online without cost; and all proceeds from this printed version will go to charity. Compared to the interactive platform, the printed version has learning objectives and exercise questions in each chapter; it also has additional notes and explanations. We aim...
Article
The shared parameter growth mixture model (SPGMM) has been proposed as a method to handle missing not at random (MNAR) data in longitudinal studies. This Monte Carlo simulation study compared the one-step approach with a three-step approach for adding covariates into the SPGMM. The results showed that performances of one-step and three-step approac...
Article
This Monte Carlo simulation study compares methods to estimate the effects of programs with multiple versions when assignment of individuals to program version is not random. These methods use generalized propensity scores, which are predicted probabilities of receiving a particular level of the treatment conditional on covariates, to remove select...
Article
To effectively teach reading to students with and at risk for disabilities, special and general education teachers depend on principals who support effective specialized reading instruction. Yet, extant research indicates that principals have inadequate preparation for supporting specialized instruction. To address this issue, scholars have recomme...
Article
In this article, 3-step methods to include predictors and distal outcomes in commonly used mixture models are evaluated. Two Monte Carlo simulation studies were conducted to compare the pseudo class (PC), Vermunt’s (2010), and the Lanza, Tan, and Bray (LTB) 3-step approaches with respect to bias of parameter estimates in latent class analysis (LCA)...
Article
Novice special educators (those in their first 3 years) consistently report their workloads are unmanageable. Yet, it is not clear whether their perceptions of workload manageability contribute to outcomes of concern such as emotional exhaustion (a component of burnout) or intentions to continue teaching in their schools and districts. This pilot i...
Article
This study examined whether the inclusion of covariates that predict class membership improves class identification in a growth mixture modeling (GMM). We manipulated the degree of class separation, sample size, the magnitude of covariate effect on class membership, the covariance between the intercept and the slope, and fit two models with covaria...
Article
The relationship between school-wide positive behavioral interventions and supports (SWPBIS) and school-level academic achievement has not been established. Most experimental research has found little to no evidence that SWPBIS has a distal effect on school-level achievement. Yet, an underlying assumption of SWPBIS is that improving social behavior...
Book
Full-text available
This practical book uses a step-by-step analysis of realistic examples to help students understand the theory and code for implementing propensity score analysis with the R statistical language. With a comparison of both well-established and cutting-edge propensity score methods, the text highlights where solid guidelines exist to support best prac...
Article
Cognitive-behavioral interventions (CBIs) are effective in decreasing externalizing behavior in school-aged children. To ensure that CBIs meet the needs of a diverse student population, it is important to examine whether intervention effectiveness is influenced by characteristics common to students identified with problem behaviors. In this study,...
Conference Paper
Cognitive-behavioral interventions (CBIs) are effective in decreasing externalizing behavior in school-aged children. To ensure that CBIs meet the needs of a diverse student population, it is important to examine whether intervention effectiveness is influenced by characteristics common to students identified with problem behaviors. In this study,...
Article
Full-text available
Purpose Despite the remarkable growth of the luxury industry, a phenomenon referred to as luxury fever, as well as the growing interest in word-of-mouth (WOM) marketing in the industry at hand, little is known about how consumers’ perceived leadership of luxury brands dynamically influences their WOM behavior. This paper aims to examine the moderat...
Article
Background: How the longitudinal asthma control status and other socio-demographic factors influence the changes of health-related quality of life (HRQOL) among asthmatic children, especially from low-income families, has not been fully investigated. Objectives: This study aimed to describe the trajectories of asthma-specific HRQOL over 15 month...
Article
Cognitive diagnosis models are diagnostic models used to classify respondents into homogenous groups based on multiple categorical latent variables representing the measured cognitive attributes. This study aims to present longitudinal models for cognitive diagnosis modeling, which can be applied to repeated measurements in order to monitor attribu...
Article
Full-text available
This study investigated the effectiveness of logistic regression models to detect uniform and non-uniform DIF in polytomous items across small sample sizes and non-normality of ability distributions. A simulation study was used to compare three logistic regression models, which were the cumulative logits model, the continuation ratio model, and the...
Chapter
In this chapter, we discuss advances in research designs and methods useful for generating plausible causal evidence in early childhood special education. In addition, we describe contemporary perspectives about measurement and present example applications in early childhood special education. We begin the chapter by reviewing briefly the strengths...
Article
Objective: The overall goal of our current study was to examine older adults' experience of Flow (i.e., subjective engagement) during the course of a home-based cognitive training program. Materials and methods: In this study, participants took part in a home-based training program. They were randomized to one of the two training groups. One gro...
Article
Full-text available
p>Neste trabalho são propostos modelos para acompanhar a evolução do desempenho educacional médio em Matemática de um grupo de indivíduos avaliados ao longo do tempo no contexto da Teoria da Resposta ao Item, possibilitando a estimação de habilidades médias em períodos não avaliados. Foram avaliadas as curvas de crescimento linear, quadrática, logí...
Article
Full-text available
This investigation examined relationships among special education teachers’ working conditions (e.g., classroom characteristics, administrative support), personal characteristics (e.g., experience, certification status, self-efficacy), instructional quality, and students with disabilities’ reading achievement and behavioral outcomes. Data from the...
Article
We investigated methods of including covariates in two-level models for cluster ran-domized trials to increase power to detect the treatment effect. We compared mul-tilevel models that included either an observed cluster mean or a latent cluster mean as a covariate, as well as the effect of including Level 1 deviation scores in the model. A Monte C...
Article
Violence prevention programs are commonplace in today’s schools though reviews of the literature reveal mixed empirical findings on their effectiveness. Often, these programs include a variety of components such as social skills training, student mentoring, and activities designed to build a sense of school community that have not been tested for i...
Article
Observational studies of multilevel data to estimate treatment effects must consider both the nonrandom treatment assignment mechanism and the clustered structure of the data. We present an approach for implementation of four propensity score (PS) methods with multilevel data involving creation of weights and three types of weight scaling (normaliz...
Article
This Monte Carlo simulation study investigated the impact of nonnormality on estimating and testing mediated effects with the parallel process latent growth model and 3 popular methods for testing the mediated effect (i.e., Sobel’s test, the asymmetric confidence limits, and the bias-corrected bootstrap). It was found that nonnormality had little e...
Article
Full-text available
The Teacher Communication Behavior Questionnaire (TCBQ) has been used at different levels of education in many countries to measure students’ perceptions of science teachers’ communication behavior. The TCBQ was translated into Portuguese in accordance with ITC test adaptation standards. Validity evidence for the Brazilian version of the TCBQ was o...
Article
Full-text available
The Teacher Communication Behavior Questionnaire (TCBQ) has been used at different levels of education in many countries to measure students' perceptions of their science teachers' communication behavior. The TCBQ was translated into Portuguese in accordance with ITC test adaptation standards. Validity evidence for the Brazilian version of the TCBQ...
Article
In longitudinal data collection, it is common that each wave of collection spans several months. However, researchers using latent growth models commonly ignore variability in data collection occasions within a wave. In this study, we investigated the consequences of ignoring within-wave variability in measurement occasions using a Monte Carlo simu...
Article
Growth mixture modeling (GMM) is a useful statistical method for longitudinal studies because it includes features of both latent growth modeling (LGM) and finite mixture modeling. This Monte Carlo simulation study explored the impact of ignoring 3 types of time series processes (i.e., AR(1), MA(1), and ARMA(1,1)) in GMM and manipulated the separat...
Article
Confirmatory factor analysis (CFA) is widely used for analyzing multitrait-multimethod (MTMM) data. But there is no consensus about whether multiplicative or additive trait-method effect of its parameterization, most appropriately represents the underlying structure ofMTMM data. Given the popularity of two additive CFA models, the CT-CM model and t...
Article
The purpose of this study is to examine the effects of Head Start on early literacy skills relevant to school readiness of English language learners compared to their peers. The comparisons of literacy outcomes were conducted between English language learners and non-English language learners when both groups participated and were not in Head Start...
Article
Full-text available
Os objetivos dessa pesquisa foram: 1) analisar as concepções de avaliação dos alunos do ensino superior; 2) adaptar e validar o questionário Students’ Conceptions of Assessment (SCoA); 3) investigar as definições de avaliação dos alunos; 4) analisar como as concepções de avaliação predizem as definições de avaliação dos alunos; 5) analisar diferenç...
Article
PurposeSocial learning theory and self-control theory differ considerably in their interpretation of what qualifies as a “valid” measure of peer deviance. While the two theories are epistemological opposites in regards to how to operationalize the peer deviance construct, their differences are reconcilable. The current study seeks to identify a set...

Network

Cited By