
Jean-Paul Fox- dr.
- Professor (Associate) at University of Twente
Jean-Paul Fox
- dr.
- Professor (Associate) at University of Twente
About
113
Publications
24,256
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,695
Citations
Introduction
Current institution
Publications
Publications (113)
In this work, we introduce a multiple-group longitudinal IRT model that accounts for skewed latent trait distributions. Our approach extends the model proposed by Santos et al. in 2022, which introduced a general class of longitudinal IRT models. The latent traits follow a multivariate skew-normal distribution, induced by an antedependence structur...
In computer-based testing it has become standard to collect response accuracy (RA) and response times (RTs) for each test item. IRT models are used to measure a latent variable ( e.g ., ability, intelligence) using the RA observations. The information in the RTs can help to improve routine operations in (educational) testing, and provide informatio...
A Bayesian multivariate model with a structured covariance matrix for multi-way nested data is proposed. This flexible modeling framework allows for positive and for negative associations among clustered observations, and generalizes the well-known dependence structure implied by random effects. A conjugate shifted-inverse gamma prior is proposed f...
The multilevel model (MLM) is the popular approach to describe dependences of hierarchically clustered observations. A main feature is the capability to estimate (cluster-specific) random effect parameters, while their distribution describes the variation across clusters. However, the MLM can only model positive associations among clustered observa...
There have been considerable methodological developments of Bayes factors for hypothesis testing in the social and behavioral sciences, and related fields. This development is due to the flexibility of the Bayes factor for testing multiple hypotheses simultaneously, the ability to test complex hypotheses involving equality as well as order constrai...
It is usually impossible to impose experimental conditions in large-scale longitudinal (observational) studies in education. This increases the risk of bias due to for instance unobserved heterogeneity, missing background variables, and dropouts. A flexible statistical model is required for the nature of the observational assessment data and to acc...
The multilevel model (MLM) is the popular approach to describe dependences of hierarchically clustered observations. A main feature is the capability to estimate (cluster-specific) random effect parameters, while their distribution describes the variation across clusters. However, the MLM can only model positive associations among clustered observa...
In \textit{computer-based testing} it has become standard to collect response accuracy (RA) and response times (RTs) for each test item. IRT models are used to measure a latent variable (e.g., ability, intelligence) using the RA observations. The information in the RTs can help to improve routine operations in (educational) testing, and provide inf...
Randomized response (RR) designs are used to collect response data about sensitive behaviors (e.g., criminal behavior, sexual desires). The modeling of RR data is more complex, since it requires a description of the RR process. For the class of generalized linear mixed models (GLMMs), the RR process can be represented by an adjusted link function,...
The linear mixed effects model is an often used tool for the analysis of multilevel data. However, this model has an ill-understood shortcoming: it assumes that observations within clusters are always positively correlated. This assumption is not always true: individuals competing in a cluster for scarce resources are negatively correlated. Random...
In a Bayesian Covariance Structure Model (BCSM) the dependence structure implied by random item parameters is modelled directly through the covariance structure. The corresponding measurement invariance assumption for an item is represented by an additional correlation in the item responses in a group. The BCSM for measurement invariance testing is...
Standard item response theory (IRT) models have been extended with testlet effects to account for the nesting of items; these are well known as (Bayesian) testlet models or random effect models for testlets. The testlet modeling framework has several disadvantages. A sufficient number of testlet items are needed to estimate testlet effects, and a s...
In medical research, repeated questionnaire data is often used to measure and model latent variables across time. Through a novel imputation method, a direct comparison is made between latent growth analysis under classical test theory and item response theory, while also including effects of missing item responses. For classical test theory and it...
There has been a tremendous methodological development of Bayes factors for hypothesis testing in the social and behavioral sciences, and related fields. This development is due to the flexibility of the Bayes factor for testing multiple hypotheses simultaneously, the ability to test complex hypotheses involving equality as well as order constraint...
Data‐based decision making (DBDM) is presumed to improve student performance in elementary schools in all subjects. The majority of studies in which DBDM effects have been evaluated have focused on mathematics. A hierarchical multiple single‐subject design was used to measure effects of a 2‐year training, in which entire school teams learned how to...
It is challenging for survey researchers to investigate sensitive topics due to concerns about socially desirable responding (SDR). The susceptibility to social desirability bias may vary not only between individuals (e.g., different perceptions about social norms) but also within individuals (e.g., perceived sensitivity of different items). Thus,...
A novel Bayesian modeling framework for response accuracy (RA), response times (RTs) and other process data is proposed. In a Bayesian covariance structure modeling approach, nested and crossed dependences within test-taker data (e.g., within a testlet, between RAs and RTs for an item) are explicitly modeled. The local dependences are modeled direc...
One of the most important goals of the Programme for International Student Assessment (PISA) is assessing national changes in educational performance over time. These so-called trend results inform policy makers about the development of ability of 15-year-old students within a specific country. The validity of those trend results prescribes invaria...
Latent growth models are often used to measure individual trajectories representing change over time. The characteristics of the individual trajectories depend on the variability in the longitudinal outcomes. In many medical and epidemiological studies, the individual health outcomes cannot be observed directly and are indirectly observed through i...
Online interventions hold great potential for Therapeutic Change Process Research (TCPR), a field that aims to relate in-therapeutic change processes to the outcomes of interventions. Online a client is treated essentially through the language their counsellor uses, therefore the verbal interaction contains many important ingredients that bring abo...
A multivariate generalization of the log-normal model for response times is proposed within an innovative Bayesian modeling framework. A novel Bayesian Covariance Structure Model (BCSM) is proposed, where the inclusion of random-effect variables is avoided, while their implied dependencies are modeled directly through an additive covariance structu...
Large-scale surveys such as the Programme for International Student Assessment (PISA), the Teaching and Learning International Survey (TALIS), and the Programme for the International Assessment of Adult Competences (PIAAC) use advanced statistical models to estimate scores of latent traits from multiple observed responses. The comparison of such es...
Large-scale surveys such as the Programme for International Student Assessment (PISA), the Teaching and Learning International Survey (TALIS), and the Programme for the International Assessment of Adult Competences (PIAAC) use advanced statistical models to estimate scores of latent traits from multiple observed responses. The comparison of such es...
Response bias (nonresponse and social desirability bias) is one of the main concerns when asking sensitive questions about behavior and attitudes. Self-reports on sensitive issues as in health research (e.g., drug and alcohol abuse), and social and behavioral sciences (e.g., attitudes against refugees, academic cheating) can be expected to be subje...
The intraclass correlation plays a central role in modeling hierarchically structured data, such as educational data, panel data, or group-randomized trial data. It represents relevant information concerning the between-group and within-group variation. Methods for Bayesian hypothesis tests concerning the intraclass correlation are proposed to impr...
School leaders are assumed to be important for the implementation of data-based decision making (DBDM), but little is known about changes in leadership during this implementation. Educational leadership was measured before, during, and after a two-year, school-wide DBDM intervention in 96 primary schools. Advanced analysis techniques were applied:...
Bayesian item response models have been developed to analyze test data and to measure latent variables. In Bayesian psychometric modelling, it is possible to include genuine prior information about the assessment in addition to information available in the observed response data. This chapter discusses advantages of Bayesian item response models in...
It is challenging for survey researchers to investigate sensitive topics due to concerns about socially desirable responding (SDR). The susceptibility to social desirability bias may vary not only between individuals (e.g., different perceptions about social norms) but also within individuals (e.g., perceived sensitivity of different items). Thus,...
Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated...
Response accuracy and response time data can be analyzed with a joint model to measure ability and speed of working, while accounting for relationships between item and person characteristics. In this study, person-fit statistics are proposed for joint models to detect aberrant response accuracy and/or response time patterns. The person-fit tests t...
Early research on response time modeling assumed that a test taker would show consistent response time behavior, often referred to as working speed, over the course of a test. Such models may be unrealistic for various reasons — a warm-up effect may cause a test taker to respond more slowly than expected to the early items, fatigue may cause a test...
Drawing on the work of internationally acclaimed experts in the field, Handbook of Item Response Theory, Volume One: Models presents all major item response models. This first volume in a three-volume set covers many model developments that have occurred in item response theory (IRT) during the last 20 years. It describes models for different respo...
Context
Collaboration within school teams is considered to be important to build the capacity school teams need to work in a data-based way. In a school characterized by a strong collaborative culture, teachers may have more access to the knowledge and skills for analyzing data, teachers have more opportunity to discuss the performance goals to be...
Longitudinal research in higher education faces several challenges. Appropriate methods of analyzing competence growth of students are needed to deal with those challenges and thereby obtain valid results. In this article, a pretest-posttest-posttest multivariate multilevel IRT model for repeated measures is introduced which is designed to address...
Objective:
In randomised controlled trials (RCT), outcome variables are often patient reported outcomes (PRO) measured with questionnaires. Ideally, all available item information is used for score construction, which requires an item response theory (IRT) measurement model. However, in practice, the classical test theory measurement model (sum sc...
With computerized testing, it is possible to record both the responses of test takers to test questions (i.e., items) and the amount of time spent by a test taker in responding to each question. Various models have been proposed that take into account both test-taker ability and working speed, with the many models assuming a constant working speed...
Despite growing international interest in the use of data to improve education, few studies examining the effects on student achievement are yet available. In the present study, the effects of a two-year data-based decision-making intervention on student achievement growth were investigated. Fifty-three primary schools participated in the project,...
Longitudinal data can be used to estimate the transition intensities between healthy and unhealthy states prior to death. An illness-death model for history of stroke is presented, where time-dependent transition intensities are regressed on a latent variable representing cognitive function. The change of this function over time is described by a l...
In recent years, marketing researchers have become increasingly interested in under- and overreporting. However, there are few suitable approaches to operationalize deviations from the truth, particularly in behavioral domains in which self-reports are usually the only viable method of choice to measure behavior or attitudes. An especially difficul...
The underlying mechanisms of the effectiveness of cognitive behavioural interventions for chronic pain need further clarification. The role of, and associations between, pain-related psychological flexibility (PF) and pain catastrophizing (PC) were examined during a randomized controlled trial on internet-based Acceptance & Commitment Therapy (ACT)...
An aggregation strategy is proposed to potentially address practical limitation related to computing resources for two-level multidimensional item response theory (MIRT) models with large data sets. The aggregate model is derived by integration of the normal ogive model, and an adaptation of the stochastic approximation expectation maximization alg...
A mixed-effects regression model with a bent-cable change-point predictor is formulated
to describe potential decline of cognitive function over time in the older population. For the individual
trajectories, cognitive function is considered to be a latent variable measured through an item
response theory model given longitudinal test data. Individu...
When comparing test or questionnaire scores between groups, an important assumption is that the questionnaire or test items are measurement invariant: that they measure the underlying construct in the same way in each group. The main goal of tests for measurement invariance is to establish whether support exists for the null hypothesis of invarianc...
Multi-item questionnaires are important instruments for monitoring health in epidemiological longitudinal studies. Mostly sum-scores are used as a summary measure for these multi-item questionnaires. The objective of this study was to show the negative impact of using sum-score based longitudinal data analysis instead of Item Response Theory (IRT)-...
The Internet Movie Database (www.imdb.com) is the largest and most successful website for movie information, yet crowdsourced contents of sites like these have rarely been studied. Therefore, using IMDb synopsis texts, reviewers' movie descriptions were analyzed regarding movie contents that have been the subject of many previous media studies: the...
Many standardized tests are now administered via computer rather than paper-and-pencil format. In a computer-based testing environment, it is possible to record not only the test taker’s response to each question (item) but also the amount of time spent by the test taker in considering and answering each item. Response times (RTs) provide informati...
Bayesian item response theory models have been widely used in different research fields. They support measuring constructs and modeling relationships between constructs, while accounting for complex test situations (e.g., complex sampling designs, missing data, heterogenous population). Advantages of this flexible modeling framework together with p...
Educational studies are often focused on growth in student performance and background variables that can explain developmental differences across examinees. To study educational progress, a flexible latent variable model is required to model individual differences in growth given longitudinal item response data, while accounting for time-heterogeno...
Mega- or meta-analytic studies (e.g. genome-wide association studies) are increasingly used in behavior genetics. An issue in such studies is that phenotypes are often measured by different instruments across study cohorts, requiring harmonization of measures so that more powerful fixed effect meta-analyses can be employed. Within the Genetics of P...
The present study concerns a Dutch computer-based assessment, which includes an assessment process about information literacy and a feedback process for students. The assessment is concerned with the measurement of skills in information literacy and the feedback process with item-based support to improve student learning. To analyze students' feedb...
In educational studies, the use of computer-based assessments leads to the collection of multiple outcomes to assess student performance. The student-specific outcomes are correlated and often measured in different scales, such as continuous and count outcomes. A multivariate zero-inflated model with random effects is proposed and adapted for the c...
Misleading response behavior is expected in medical settings where incriminating behavior is negatively related to the recovery from a disease. In the present study, lung patients feel social and professional pressure concerning smoking and experience questions about smoking behavior as sensitive and tend to conceal embarrassing or threatening info...
Longitudinal surveys measuring physical or mental health status are a common method to evaluate treatments. Multiple items are administered repeatedly to assess changes in the underlying health status of the patient. Traditional models to analyze the resulting data assume that the characteristics of at least some items are identical over measuremen...
Randomized response (RR) models are often used for analysing univariate randomized response data and measuring population prevalence of sensitive behaviours. There is much empirical support for the belief that RR methods improve the cooperation of the respondents. Recently, RR models have been extended to measure individual unidimensional behaviour...
This study examined the role of psychological flexibility, as a risk factor and as a process of change, in a self-help Acceptance and Commitment Therapy (ACT) intervention for adults with mild to moderate depression and anxiety. Participants were randomized to the self-help programme with e-mail support (n = 250), or to a waiting list control group...
The multiple group IRT model (MGM) proposed by Bock and Zimowski (1997) provides a useful framework for analyzing item response data from clustered respondents. In the MGM, the selected groups of respondents are of specific interest such that group-specific population distributions need to be defined. The main goal is to explore the potentials of a...
Random item effects models provide a natural framework for the exploration of violations of measurement invariance without the need for anchor items. Within the random item effects modelling framework, Bayesian tests (Bayes factor, deviance information criterion) are proposed which enable multiple marginal invariance hypotheses to be tested simulta...
Item response theory (IRT) methods are standard tools for the analysis of large-scale assessments of student’s performance. In educational survey research, the National Assessment of Educational Progress (NAEP) is primarily focused on scaling the performances of a sample of students in a subject area (e.g., mathematics, reading, science) on a singl...
Complex dependency structures are often conditionally modeled, where random effects parameters are used to specify the natural heterogeneity in the population. When interest is focused on the dependency structure, inferences can be made from a complex covariance matrix using a marginal modeling approach. In this marginal modeling framework, testing...
Longitudinal data can be used to estimate the transition intensities between healthy and unhealthy states prior to death. An illness-death model for history of stroke is presented, where time-dependent transition intensities are regressed on a latent variable representing cognitive function. The change of this function over time is described by a l...
A general joint modeling framework is proposed that includes a parametric stratified survival component for continuous time survival data, and a mixture multilevel item response component to model latent developmental trajectories given mixed discrete response data. The joint model is illustrated in a real data setting, where the utility of longitu...
Hierarchical modeling of responses and response times on test items facilitates the use of response times as collateral information in the estimation of the response parameters. In addition to the regular information in the response data, two sources of collateral information are identified: (a) the joint information in the responses and the respon...
Item responses can be masked before they are observed via a randomized response mechanism. This technique is used to protect
individuals and improve their willingness to answer truthfully. Various traditional randomized response sampling techniques
are discussed and extended to a multivariate setting. So-called randomized item response models will...
Cluster-speci_c item e_ects parameters are introduced that are assumed to vary over clusters of respondents. The modeling
of cluster-speci_c item parameters relaxes the assumptions of measurement invariance. Item characteristic di_erences are simply
allowed, and it is not necessary to classify items as being invariant or noninvariant. Tests and est...
Response times and responses can be collected via computer adaptive testing or computer-assisted questioning. Inferences about
test takers and test items can therefore be based on the response time and response accuracy information. Response times and
responses are used to measure a respondent's speed of working and ability using a multivariate hie...
In the _rst chapter, an introduction to Bayesian item response modeling was given. The Bayesian methodology requires careful
speci_cation of priors since item response models contain many parameters, often of the same type. A hierarchical modeling
approach is introduced that supports the pooling of information to improve the precision of the parame...
A review of Bayesian estimation and testing methods is given that is not a thorough overview but concentrates on some speci_c
elements. First, simulation-based methods for parameter estimation, like the Gibbs sampling and the Metropolis-Hastings algorithms,
from the general class of Markov chain Monte Carlo algorithms, are discussed. Second, the Ba...
The underlying assumptions of Bayesian item response models have to be examined to ensure their credibility and that meaningful
inferences can be made. A set of tools will be discussed for testing model assumptions and hypotheses. This set of tools includes
methods based on Bayesian residuals and predictive diagnostic checks. It will be shown that...
The general form of a Bayesian item response model consists of a probability model for the responses, prior distributions
for the model parameters, and possibly prior distributions for the hyperparameters. An overview of Bayesian procedures for
simultaneous estimation is given in which MCMC estimation methods are emphasized. Interest is focused on...
In modern society, tests are used extensively in schools, industry, and government.Test results can be of value in counseling,
treatment, and selection of individuals. Tests can have a variety of functions, and often a broad classication is made in
cognitive (tests as measures of ability) versus a_ectivetests (tests designed to measure interest, at...
The item response data structure is hierarchical since item responses are nested within respondents. Often respondents are also grouped into larger units and variables are available that characterize the respondents and the higher-level units. An item response modeling framework is discussed that includes a multilevel population model for the respo...
The authors present a polytomous item randomized response model to measure socially sensitive consumer behavior. It complements established methods in marketing to correct for social desirability bias a posteriori and traditional randomized response models to prevent social desirability bias a priori. The model allows for individual-level inference...
In current psychological research, the analysis of data from computer-based assessments or experiments is often confined to accuracy scores. Response times, although being an important source of additional information, are either neglected or analyzed separately. In this article, a new model is developed that allows the simultaneous analysis of acc...
Response times on test items are easily collected in modern computerized testing. When collecting both (binary) responses and (continuous) response times on test items, it is possible to measure the accuracy and speed of test takers. To study the relationships between these two constructs, the model is extended with a multivariate multilevel regres...
The log-transform has been a convenient choice in response time modelling on test items. However, motivated by a dataset of the Medical College Admission Test where the lognormal model violated the normality assumption, the possibilities of the broader class of Box-Cox transformations for response time modelling are investigated. After an introduct...
The randomized response technique ensures that individual item responses, denoted as true item responses, are randomized before observing them and so-called randomized item responses are observed. A relationship is specified between randomized item response data and true item response data. True item response data are modeled with a (non)linear mix...
The authors discuss a new method that combines the randomized response technique with item response theory. This method allows the researcher to obtain information at the individual person level without knowing the true responses. With this new method, it is possible to compare groups of individuals by means of analysis of variance or regression an...
Extreme response style (ERS) is an important threat to the validity of survey-based marketing research. In this article, the authors present a new item response theory–based model for measuring ERS. This model contributes to the ERS literature in two ways. First, the method improves on existing procedures by allowing different items to be different...
With the growing interest of consumer researchers to test measures and theories in an international context, the cross-national invariance of measurement instruments has become an important issue. At least two issues still need to be addressed. First, the ordinal nature of the rating scale is ignored. Second, when few or no items in the confirmator...
With the growing interest of consumer researchers to test measures and theories in an international context, the cross-national invariance of measurement instruments has become an important issue. At least two issues still need to be addressed. First, the ordinal nature of the rating scale is ignored. Second, when few or no items in the confirmator...
There is much empirical evidence that randomized response methods improve the cooperation of the respondents when asking sensitive questions. The traditional methods for analysing randomized response data are restricted to univariate data and only allow inferences at the group level due to the randomized response sampling design. Here, a novel beta...
Variance component models are generally accepted for the analysis of hierarchical structured data. A shortcoming is that outcome variables are still treated as measured without an error. Unreliable variables produce biases in the estimates of the other model parameters. The variability of the relationships across groups and the group-effects on ind...
In computerized testing, the test takers' responses as well as their response times on the items are recorded. The relationship between response times and response accuracies is complex and varies over levels of observation. For example, it takes the form of a tradeoff between speed and accuracy at the level of a fixed person but may become a posit...
Run-to-run variability is a common problem for modeling batch-wise and semi-continuous operated processes. Although observed reactor runs show the same trends in process behaviour, each specific reactor run also shows its own characteristics. Until now, available modeling methods were unable to describe the observed between run variance. In this pa...
A fixed effect item response theory (IRT) model is developed for modeling group specific item parameters. Two applications are presented. The first application is that the proposed model can be used to detect whether a response mechanism is ignorable using the splitter item technique. The second application is the detection of differential item fun...