[show abstract][hide abstract] ABSTRACT: Linear mixed modeling is a useful approach for double mixed factorial designs with
covariates. It is explained how these designs are appropriate for the study of human behavior
as a function of characteristics of persons and situations and stimuli in the situations. The
behavior of subjects nested in types of persons responding to stimuli nested in types of stimuli
defines a mixed factorial design. The inclusion of additional covariates of the observational
units can help to further explain the behavior under study. A linear mixed modeling approach
for such designs allows a combined focus on fixed effects (general effects) and individual
and stimulus differences in these effects. This combination has the potential to advance the
integration of two different sub-disciplines of psychology: general psychology and differential
psychology, so that they can borrow strength from each other. An application is presented with
semantic categorization response time data from a factorial design with age groups by word
types and with age of acquisition as an additional covariate of the words. The results throw
light on the processes underlying the effect of age of acquisition and on individual differences
and word differences.
Journal of the Royal Statistical Society Series C Applied Statistics 02/2014; 63(2):289-302. · 1.25 Impact Factor
[show abstract][hide abstract] ABSTRACT: For item response theory (IRT) models, which belong to the class of generalized linear or non-linear mixed models, reliability at the scale of observed scores (i.e., manifest correlation) is more difficult to calculate than latent correlation based reliability, but usually of greater scientific interest. This is not least because it cannot be calculated explicitly when the logit link is used in conjunction with normal random effects. As such, approximations such as Fisher's information coefficient, Cronbach's α, or the latent correlation are calculated, allegedly because it is easy to do so. Cronbach's α has well-known and serious drawbacks, Fisher's information is not meaningful under certain circumstances, and there is an important but often overlooked difference between latent and manifest correlations. Here, manifest correlation refers to correlation between observed scores, while latent correlation refers to correlation between scores at the latent (e.g., logit or probit) scale. Thus, using one in place of the other can lead to erroneous conclusions. Taylor series based reliability measures, which are based on manifest correlation functions, are derived and a careful comparison of reliability measures based on latent correlations, Fisher's information, and exact reliability is carried out. The latent correlations are virtually always considerably higher than their manifest counterparts, Fisher's information measure shows no coherent behaviour (it is even negative in some cases), while the newly introduced Taylor series based approximations reflect the exact reliability very closely. Comparisons among the various types of correlations, for various IRT models, are made using algebraic expressions, Monte Carlo simulations, and data analysis. Given the light computational burden and the performance of Taylor series based reliability measures, their use is recommended.
British Journal of Mathematical and Statistical Psychology 02/2014; · 1.26 Impact Factor
[show abstract][hide abstract] ABSTRACT: An additive multilevel item structure (AMIS) model with random residuals is proposed. The model includes multilevel latent regressions of item discrimination and item difficulty parameters on covariates at both item and item category levels with random residuals at both levels. The AMIS model is useful for explanation purposes and also for prediction purposes as in an item generation context. The parameters can be estimated with an alternating imputation posterior algorithm that makes use of adaptive quadrature, and the performance of this algorithm is evaluated in a simulation study.
[show abstract][hide abstract] ABSTRACT: We apply the Hawkes process to the analysis of dyadic interaction. The Hawkes process is applicable to excitatory interactions, wherein the actions of each individual increase the probability of further actions in the near future. We consider the representation of the Hawkes process both as a conditional intensity function and as a cluster Poisson process. The former treats the probability of an action in continuous time via non-stationary distributions with arbitrarily long historical dependency, while the latter is conducive to maximum likelihood estimation using the EM algorithm. We first outline the interpretation of the Hawkes process in the dyadic context, and then illustrate its application with an example concerning email transactions in the work place.
[show abstract][hide abstract] ABSTRACT: In social network studies, most often only a single relation (or link) between the actors is investigated. When more than one link has been recorded, the two- way sociomatrix becomes a three-way array with the set of links being the third way. In this paper, we present a model which simultaneously accounts for the three ways in the data. Random effects are used to model the between-actor variability, both on senders and receivers side. In addition, structural relations between the linking variables are investigated. The model is applied to a study of popularity and strength in a class of students. It is shown that popularity can be seen as a linear function of strength on the receivers' side, but not on the senders' side.
[show abstract][hide abstract] ABSTRACT: A category of item response models is presented with two defining features: they all (i) have a tree representation, and (ii) are members of the family of generalized linear mixed models (GLMM). Because the models are based on trees, they are denoted as IRTree models. The GLMM nature of the models implies that they can all be estimated with the glmer function of the lme4 package in R. The aim of the article is to present four subcategories of models, the first two of which are based on a tree representation for re-sponse categories: 1. linear response tree models (e.g., missing response models), 2. nested response tree models (e.g., models for parallel observations regarding item responses such as agreement and certainty), while the last two are based on a tree representation for latent variables: 3. linear latent-variable tree models (e.g., models for change processes), and 4. nested latent-variable tree models (e.g., bi-factor models). The use of the glmer function is illustrated for all four subcategories. Simulated example data sets and two ser-vice functions useful in preparing the data for IRTree modeling with glmer are provided in the form of an R package, irtrees. For all four subcategories also a real data application is discussed.
JSS Journal of Statistical Software Code Snippet. 04/2012; 48(1).
[show abstract][hide abstract] ABSTRACT: Responses to items from an intelligence test may be fast or slow. The research issue dealt with in this paper is whether the intelligence involved in fast correct responses differs in nature from the intelligence involved in slow correct responses. There are two questions related to this issue: 1. Are the processes involved different? 2. Are the abilities involved different? An answer to these questions is provided making use of data from a Raven-like matrices test and a verbal analogies test, and the use of a psychometric branching model. The branching model is based on three latent traits: speed, fast accuracy and slow accuracy, and item parameters corresponding to each of these. The pattern of item difficulties is used to draw conclusions on the cognitive processes involved. The results are as follows: 1. The processes involved in fast and slow responses can be differentiated, as can be derived from qualitative differences in the patterns of item difficulty, and fast responses lead to a larger differentiation between items than slow responses do. 2. The abilities underlying fast and slow responses can also be differentiated, and fast responses allow for a better differentiation between the respondents.
[show abstract][hide abstract] ABSTRACT: The identification of differential item functioning (DIF) is often performed by means of statistical approaches that consider the raw scores as proxys for the ability trait level. One of the most popular approaches, the Mantel-Haenszel (MH) method, belongs to this category. However, replacing the ability level by the simple raw score is a source of potential Type I error inflation, especially in the presence of DIF but also when DIF is absent and in the presence of impact. The purpose of this paper is to present an alternative statistical inference approach based on the same measure of DIF but such that the Type I error inflation is prevented. The key notion is that for DIF items, the measure has an outlying value which can be identified as such with inference tools from robust statistics. Although we use the MH log-odds ratio as a statistic, the inference is different. A simulation study is performed to compare the robust statistical inference with the classical inference method, both based on the MH statistic. As expected the Type I error rate inflation is avoided with the robust approach, while the power of the two methods is similar.
Educational and Psychological Measurement 01/2012; 72:291-311. · 1.07 Impact Factor
[show abstract][hide abstract] ABSTRACT: The identification of differential item functioning (DIF) is often performed by means of statistical approaches that consider the raw scores as proxies for the ability trait level. One of the most popular approaches, the Mantel–Haenszel (MH) method, belongs to this category. However, replacing the ability level by the simple raw score is a source of potential Type I error inflation, not only in the presence of DIF but also when DIF is absent and in the presence of impact. The purpose of this article is to present an alternative statistical inference approach based on the same measure of DIF but such that the Type I error inflation is prevented. The key notion is that for DIF items, the measure has an outlying value that can be identified as such with inference tools from robust statistics. Although we use the MH log odds ratio as a statistic, the inference is different. A simulation study is performed to compare the robust statistical inference with the classical inference method, both based on the MH statistic. As expected, the Type I error rate inflation is avoided with the robust approach, although the power of the two methods is similar.
Educational and Psychological Measurement 01/2012; 72(2):291-311. · 1.07 Impact Factor
[show abstract][hide abstract] ABSTRACT: Multiple item response profile (MIRP) models are models with crossed fixed and random effects. At least one between-person factor is crossed with at least one within-person factor, and the persons nested within the levels of the between-person factor are crossed with the items within levels of the within-person factor. Maximum likelihood estimation (MLE) of models for binary data with crossed random effects is challenging. This is because the marginal likelihood does not have a closed form, so that MLE requires numerical or Monte Carlo integration. In addition, the multidimensional structure of MIRPs makes the estimation complex. In this paper, three different estimation methods to meet these challenges are described: the Laplace approximation to the integrand; hierarchical Bayesian analysis, a simulation-based method; and an alternating imputation posterior with adaptive quadrature as the approximation to the integral. In addition, this paper discusses the advantages and disadvantages of these three estimation methods for MIRPs. The three algorithms are compared in a real data application and a simulation study was also done to compare their behaviour.
British Journal of Mathematical and Statistical Psychology 11/2011; 65(3):438-66. · 1.26 Impact Factor
[show abstract][hide abstract] ABSTRACT: An old issue in psychological assessment is to what extent power and speed each are measured by a given intelligence test. Starting from accuracy and response time data, an approach based on posterior time limits (cut-offs of recorded response time) leads to three kinds of recoded data: time data (whether or not the response precedes the cut-off), time-accuracy data (whether or not a response is correct and precedes the cut-off), and accuracy data (as time-accuracy data, but coded as missing when not preceding the time cut-off). Each type of data can be modeled as binary responses. Speed and power are investigated through the effect of posterior time limits on two main aspects: (a) the latent variable that is measured: whether it is more power-related or more speed-related; (b) how well the latent variable (of whatever kind) is measured through the item(s). As empirical data, we use responses and response times for a verbal analogies test. The main findings are that, independent of the posterior time limit, basically the same latent speed trait was measured through the time data, and basically the same latent power trait was measured through the accuracy data, while for the time-accuracy data the nature of the latent trait moved from power to speed when the posterior time limit was reduced. It was also found that a reduction of the posterior time limit had no negative effect on the reliability of the latent trait measures (of whatever kind).
[show abstract][hide abstract] ABSTRACT: The models used in this article are secondary dimension mixture models with the potential to explain differential item functioning (DIF) between latent classes, called latent DIF. The focus is on models with a secondary dimension that is at the same time specific to the DIF latent class and linked to an item property. A description of the models is provided along with a means of estimating model parameters using easily available software and a description of how the models behave in two applications. One application concerns a test that is sensitive to speededness and the other is based on an arithmetic operations test where the division items show latent DIF.
[show abstract][hide abstract] ABSTRACT: In this paper we elaborate on the potential of the lmer function from the lme4 package in R for item response (IRT) modeling. In line with the package, an IRT framework is described based on generalized linear mixed modeling. The aspects of the framework refer to (a) the kind of covariates -- their mode (person, item, person-by-item), and their being external vs. internal to responses, and (b) the kind of effects the covariates have -- fixed vs. random, and if random, the mode across which the effects are random (persons, items). Based on this framework, three broad categories of models are described: Item covariate models, person covariate models, and person-by-item covariate models, and within each category three types of more specific models are discussed. The models in question are explained and the associated lmer code is given. Examples of models are the linear logistic test model with an error term, differential item functioning models, and local item dependency models. Because the lme4 package is for univariate generalized linear mixed models, neither the two-parameter, and three-parameter models, nor the item response models for polytomous response data, can be estimated with the lmer function.
Journal of statistical software 01/2011; · 4.91 Impact Factor
[show abstract][hide abstract] ABSTRACT: Standardized tests are used widely in comparative studies of clinical populations, either as dependent or control variables. Yet, one cannot always be sure that the test items measure the same constructs in the groups under study. In the present work, 460 participants with intellectual disability of undifferentiated etiology and 488 typical children were tested using Raven’s Colored Progressive Matrices (RCPM). Data were analyzed using binomial logistic regression modeling designed to detect differential item functioning (DIF). Results showed that 12 items out of 36 function differentially between the two groups, but only 2 items exhibit at least moderate DIF. Thus, a very large majority of the items have identical discriminative power and difficulty levels across the two groups. It is concluded that RCPM can be used with confidence in studies comparing participants with and without intellectual disability. In addition, it is suggested that methods for investigating internal bias of tests used in cross-cultural, cross-linguistic or cross gender comparisons should also be regularly employed in studies of clinical populations, particularly in the field of developmental disability, to show the absence of systematic measurement error (i.e. DIF) affecting item responses.
[show abstract][hide abstract] ABSTRACT: This paper focuses on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is proposed to identify DIF items as outliers in the multivariate space. For low dimensionalities, up to two three groups, also a simple graphical tool is derived. We illustrate our approach with a re-analysis of data from Kim, Cohen, and Park (1995) on using calculators for a mathematics test.
Multivariate Behavioral Research 01/2011; 46:733-755. · 1.66 Impact Factor
[show abstract][hide abstract] ABSTRACT: Differential item functioning (DIF) is an important issue of interest in psychometrics and educational measurement. Several methods have been proposed in recent decades for identifying items that function differently between two or more groups of examinees. Starting from a framework for classifying DIF detection methods and from a comparative overview of the most traditional methods, an R package for nine methods, called difR, is presented. The commands and options are briefly described, and the package is illustrated through the analysis of a data set on verbal aggression.
Behavior Research Methods 08/2010; 42(3):847-62. · 2.12 Impact Factor
[show abstract][hide abstract] ABSTRACT: For the analysis of binary data, various deterministic models have been proposed, which are generally simpler to fit and easier to understand than probabilistic models. We claim that corresponding to any deterministic model is an implicit stochastic model in which the deterministic model fits imperfectly, with errors occurring at random. In the context of binary data, we consider two error models: in the first model, all predictions are equally likely to be in error; in the second model, the probability of error depends on the model prediction. We show how to fit these models using a stochastic modification of deterministic optimization schemes. The advantages of fitting the stochastic models explicitly (rather than implicitly, by simply fitting a deterministic model and accepting the occurrence of errors) include quantification of uncertainty in the deterministic model's parameter estimates, better estimation of the true model error rate, and the ability to check the fit of ...
[show abstract][hide abstract] ABSTRACT: In this article we present a new methodology for detecting Di erential Item Functioning (DIF). We
introduce a DIF model, called the Random Item Mixture (RIM), that is based on a Rasch model with random item difficulties (besides the common random person abilities). In addition, a mixture model is assumed for the item difficulties such that the items may belong to one of two classes: a DIF or a non-DIF class. The crucial di fference between the DIF class and the non-DIF class is that the item difficulties in the DIF class may di ffer according to the observed person groups while they are equal across the person groups for the items from the non-DIF class. Statistical inference for the RIM is carried out in a Bayesian framework. The performance of the RIM is evaluated using a simulation study in which it is compared with traditional procedures, like the Likelihood Ratio test, the Mantel-Haenszel procedure and the standardized p-DIF procedure. In this comparison, the RIM performs better than the other methods. Finally, the usefulness of the model is also demonstrated on a real life dataset.
Journal of Educational Measurement 01/2010; 47:432-457. · 1.00 Impact Factor
[show abstract][hide abstract] ABSTRACT: The local influence diagnostics, proposed by Cook (1986), provide a flexible way to assess the impact of minor model perturbations on key model parameters’ estimates. In this paper, we apply the local influence idea to the detection of test speededness in a model describing nonresponse in test data, and compare this local influence approach to the optimal person fit index proposed by Drasgow and Levine (1986), and the empirical Bayes estimate of the test speededness random effect. The performance of the methods is illustrated on the Chilean SIMCE mathematics test data. The data example indicates that the three statistics are promising when it comes to the detection of special profiles, and besides overlap to a considerable extent. Given that the statistics were developed for different purposes, they react of course differentially to the various characteristics of the response profiles, and hence also exhibit some specificity. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Methodology European Journal of Research Methods for the Behavioral and Social Sciences 01/2010; 6(1):3-16. · 0.33 Impact Factor
[show abstract][hide abstract] ABSTRACT: This study introduces an approach for modeling multidimensional response data with construct-relevant group and domain factors. The item level parameter estimation process is extended to incorporate the refined effects of test dimension and group factors. Differences in item performances over groups are evaluated, distinguishing two levels of differential item functioning (DIF): a domain level and an item level.An illustration is presented using a Dutch spelling proficiency scale administered to two subgroups. DIF is modeled by the interaction between group and item domain (domain level DIF) and by the interaction between groups and items within each domain (item level DIF). A set of item response theory models was estimated using an adaptation of the logistic regression approach. The model with domain specific item-by-group interactions or DIF performed better than the other models neglecting domain or group differences.The method appears to be promising in that explicit domain factors can be implemented into model estimation procedure to better understand why items favor a specific language group over another.
International Journal of Testing 04/2009; 9(2):151-166.