Article

Validity of Multiprocess IRT Models for Separating Content and Response Styles

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Response styles, the tendency to respond to Likert-type items irrespective of content, are a widely known threat to the reliability and validity of self-report measures. However, it is still debated how to measure and control for response styles such as extreme responding. Recently, multiprocess item response theory models have been proposed that allow for separating multiple response processes in rating data. The rationale behind these models is to define process variables that capture psychologically meaningful aspects of the response process like, for example, content- and response style-related processes. The aim of the present research was to test the validity of this approach using two large data sets. In the first study, responses to a 7-point rating scale were disentangled, and it was shown that response style-related and content-related processes were selectively linked to extraneous criteria of response styles and content. The second study, using a 4-point rating scale, focused on a content-related criterion and revealed a substantial suppression effect of response style. The findings have implications for both basic and applied fields, namely, for modeling response styles and for the interpretation of rating data.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... As another example, Plieninger and Meiser (2014) validated the response processes in a multiprocess item response theory [IRT] model with response style scales. With the multiprocess IRT model as the first primary model, the second one was given by analyzing the scales for extreme as well as mid response styles, forming parcels, and modeling them with a correlated traitscorrelated uniqueness [CT-CU] model. ...
... Cross-model covariances are represented by dashed lines. This is a simplified version of the model employed by Plieninger and Meiser (2014) to relate response styles and IRT processes, in which the factors are not simply correlated across models, but the process factors are regressed on the response style factors, and there is an additional criterion variable. Analogously to Figs. 1 and 3, the parameter labels have been omitted. ...
... lays the groundwork for the discussion of arbitrary path models in composed models, but it is beyond its scope. The correlational composed model underlying the latent regression employed by Plieninger and Meiser (2014) is shown in Fig. 2. ...
Article
Full-text available
In this article, we present a general theorem and proof for the global identification of composed CFA models. They consist of identified submodels that are related only through covariances between their respective latent factors. Composed CFA models are frequently used in the analysis of multimethod data, longitudinal data, or multidimensional psychometric data. Firstly, our theorem enables researchers to reduce the problem of identifying the composed model to the problem of identifying the submodels and verifying the conditions given by our theorem. Secondly, we show that composed CFA models are globally identified if the primary models are reduced models such as the CT-C (M1)(M-1) ( M - 1 ) model or similar types of models. In contrast, composed CFA models that include non-reduced primary models can be globally underidentified for certain types of cross-model covariance assumptions. We discuss necessary and sufficient conditions for the global identification of arbitrary composed CFA models and provide a Python code to check the identification status for an illustrative example. The code we provide can be easily adapted to more complex models.
... For more detailed and comprehensive overviews of the two IRT modeling approaches see, for example, Böckenholt and Meiser (2017) and Meiser (2020a, 2022). These psychometric approaches have been used and extended in psychometric research (e.g., Böckenholt, 2019;Henninger & Meiser, 2020b;Khorramdel & von Davier, 2014b;Meiser et al., 2019), and validated for applied research (e.g., Plieninger & Meiser, 2014). However, controlling for response styles a priori via the questionnaire format rather than accounting for response styles post hoc via complex psychometric models, might be a route that is worth further attention, in particular in applied research. ...
... It would be interesting to examine this effect using alternative measures of response styles. In addition, so far little is known about whether and how response styles affect criterion-related validity (but see Plieninger, 2017;Plieninger & Meiser, 2014). Therefore, we aim at examining whether criterion-related validity, that is the correlation to external, objective criteria, is affected by the type of response format (Likert vs. DnD). ...
... In addition to examining the above-stated hypotheses, we assessed the validity of the questionnaire measures across response format conditions with respect to two external criteria. We expected the validity, hence the correlation to an external, response style-free criterion (see the methods section below), to be higher, when the influence of response styles is reduced through appropriate response formats (see Plieninger & Meiser, 2014, for an assessment of criterion-related validity in the context of response styles). Furthermore, we assessed the generalizability of the proposed effect across item content, number of response categories, and other sample populations. ...
Article
Full-text available
Many researchers use self-report data to examine abilities, personalities, or attitudes. At the same time, there is a widespread concern that response styles, such as the tendency to give extreme, midscale, or acquiescent responses, may threaten data quality. As an alternative to post hoc control of response styles using psychometric models, a priori control using specific response formats may be a means to reduce biasing response style effects in self-report data in day-to-day research practice. Previous research has suggested that response styles were less influential in a Drag-and-Drop (DnD) format compared to the traditional Likert-type format. In this article, we further examine the advantage of the DnD format, test its generalizability, and investigate its underlying mechanisms. In two between-participants experiments, we tested different versions of the DnD format against the Likert format. We found no evidence for reduced response style influence in any of the DnD conditions, nor did we find any difference between the conditions in terms of the validity of the measures to external criteria. We conclude that adaptations of response formats, such as the DnD format, may be promising, but require more thorough examination before recommending them as a means to reduce response style influence in psychological measurement.
... Response styles (RS) are individual differences in using a particular response category regardless of respondents' target-latent trait of substantive constructs such as extreme response style (ERS; a tendency to favor the endpoint of a rating scale) or midpoint response style (MRS; a tendency to favor the midpoint of a rating scale) (Cronbach, 1946(Cronbach, , 1950Messick, 1991). RS have received significant attention from researchers because RS can potentially threaten the validity or reliability of the rating scale by contaminating observed total scores and factor structures (Johnson, Kulesa, Cho, & Shavitt, 2005;Plieninger & Meiser, 2014;Thissen-Roe & Thissen, 2013) and further be a source of lack of measurement invariance (Bolt & Johnson, 2009). ...
... Item response tree (IRTree) approach has been recently proposed as a new method for measuring and analyzing RS (Böckenholt, 2012(Böckenholt, , 2017(Böckenholt, , 2019Khorramdel, von Davier, Bertling, Roberts, & Kyllonen, 2017;Meiser, Plieninger, & Henninger, 2019;Plieninger & Meiser, 2014). IRTree enables differentiation of RS from the target-latent trait by decomposing observed responses into sequential multiple response processes based on theoretical assumptions or empirical evidence by the researcher (Böckenholt, 2012;De Boeck & Partchev, 2012). ...
... IRTree enables differentiation of RS from the target-latent trait by decomposing observed responses into sequential multiple response processes based on theoretical assumptions or empirical evidence by the researcher (Böckenholt, 2012;De Boeck & Partchev, 2012). Many works have demonstrated the usefulness and flexibility of using IRTree approach in examining RS (Böckenholt, 2012(Böckenholt, , 2017(Böckenholt, , 2019Jeon & De Boeck, 2019;Khorramdel et al., 2017;Meiser, Plieninger, & Henninger, 2019;Plieninger & Meiser, 2014). For example, Böckenholt and Meiser (2017) showed that IRTree model yielded better model fit than the two-dimensional partial credit model (PCM; Masters, 1982) using a personality scale. ...
... Although these approaches for separating content and response styles are different, there are few empirical studies examining and comparing their efficacy. Plieninger and Meiser (2014) investigated the validity of the IR tree model, but they did not compare the IR tree model with the mPCM and MNRM. Böckenholt and Meiser (2017) and Leventhal and Stone (2018) did compare these models, but the former focused on the rationales, implementation, and estimation of the IR tree model and the mPCM, while Leventhal and Stone (2018) examined the item mean square error and model fit of the IR tree model and the MNRM. ...
... This article examines and compares the validity of these models through two empirical studies. These studies adopted Plieninger and Meiser (2014) research paradigm: utilizing extraneous criteria for content and response styles to examine and compare the validity of these approaches. Depending on the content, there may or may not be an expected correlation between response style factors and the content criterion, as well as between the content factor and the response style criteria. ...
... The response style criteria were obtained with the representative indicators for response styles method (RIRS; Greenleaf, 1992;Weijters et al., 2008;De Beuckelaer et al., 2010). This approach was also used by Plieninger and Meiser (2014) as the response style criteria. It computes response style scores from a set of highly heterogeneous items to avoid content variance and thus is a valid and stable measure of response styles (De Beuckelaer et al., 2010). ...
Article
Full-text available
Response styles, the general tendency to use certain categories of rating scales over others, are a threat to the reliability and validity of self-report measures. The mixed partial credit model, the multidimensional nominal response model, and the item response tree model are three widely used models for measuring extreme and midpoint response styles and correcting their effects. This research aimed to examine and compare their validity by fitting them to empirical data and correlating the content-related factors and the response style-related factors in these models to extraneous criteria. The results showed that the content factors yielded by these models were moderately related to the content criterion and not related to the response style criteria. The response style factors were moderately related to the response style criteria and weakly related to the content criterion. Simultaneous analysis of more than one scale could improve their validity for measuring response styles. These findings indicate that the three models could control and measure extreme and midpoint response styles, though the validity of the mPCM for measuring response styles was not good in some cases. Overall, the multidimensional nominal response model performed slightly better than the other two models.
... On the other hand, IRTree models have recently been introduced as a new model family that combines processing-trees of binary decision nodes with IRT parametrizations of the node probabilities (B€ ockenholt, 2012;De Boeck & Partchev, 2012;Jeon & De Boeck, 2016). IRTree models allow researchers to investigate judgement processes that are based on item content and relate to the traits of interest together with a priori specified response styles such as the general preference for moderate responses or the tendency to choose extreme response categories (Khorramdel & von Davier, 2014;Plieninger & Meiser, 2014;Zettler, Lang, Hulsheger, & Hilbig, 2016). Unlike traditional IRT approaches, current IRTree models for response styles decompose the response process into a sequence of binary decisions, including decision nodes for agreement versus disagreement with the item content and decision nodes referring to response styles. ...
... While the mapping of the original response categories onto pseudo-items is essential for the interpretation and mathematical implications of a given IRTree model, recoding of the pseudo-items of agreement (i.e., replacing 0, 0, 0, 1, 1, 1 with 1, 1, 1, 0, 0, 0), non-moderate responding (i.e., replacing 1, 1, 0, 0, 1, 1 with 0, 0, 1, 1, 0, 0) or extreme responding (i.e., replacing 1, 0, -, -, 0, 1 with 0, 1, -, -, 1, 0) leads to an equivalent IRTree model with opposite signs of the parameters for recoded pseudo-items. Here we choose the parametrization of nonmoderate responding (i.e., 1, 1, 0, 0, 1, 1 for the second pseudo-item) rather than the more common but equivalent parametrization of moderate responding (i.e., 0, 0, 1, 1, 0, 0; see B€ ockenholt, 2012B€ ockenholt & Meiser, 2017;Khorramdel & von Davier, 2014;Plieninger & Meiser, 2014) to facilitate the integration of an ordinal process of response intensity in the next section. ...
... The model thereby captures the between-item multidimensionality concerning the three subsets of I pseudo-items, reflecting the notion that the three response processes in Figure 1 are distinct from each other and that each decision node is affected by only one latent process. The basic rationale of between-item multidimensionality concerning pseudo-items underlies most implementations of IRTree models for response styles to date (B€ ockenholt, 2012Jeon & De Boeck, 2016;Khorramdel & von Davier,2014;Plieninger & Meiser, 2014;Zettler et al., 2016). ...
Article
IRTree models decompose observed rating responses into sequences of theory‐based decision nodes, and they provide a flexible framework for analysing trait‐related judgements and response styles. However, most previous applications of IRTree models have been limited to binary decision nodes that reflect qualitatively distinct and unidimensional judgement processes. The present research extends the family of IRTree models for the analysis of response styles to ordinal judgement processes for polytomous decisions and to multidimensional parametrizations of decision nodes. The integration of ordinal judgement processes overcomes the limitation to binary nodes, and it allows researchers to test whether decisions reflect qualitatively distinct response processes or gradual steps on a joint latent continuum. The extension to multidimensional node models enables researchers to specify multiple judgement processes that simultaneously affect the decision between competing response options. Empirical applications highlight the roles of extreme and midpoint response style in rating judgements and show that judgement processes are moderated by different response formats. Model applications with multidimensional decision nodes reveal that decisions among rating categories are jointly informed by trait‐related processes and response styles.
... For rating items with four categories that are often used in educational studies, the midscale judgment can be omitted, yielding an IRTree model with two latent dimensions of agreement θ and extreme responding θ ERS (Kim & Bolt, 2021;Plieninger & Meiser, 2014). The IRTree structure follows from the diagram in Figure 2 if the first node is dropped and the remaining terminal nodes are relabelled as categories k = 0, . . . ...
... An advantage of IRTrees over multidimensional PCM is the direct interpretation of person and item parameters for each subprocess, allowing for example to determine which items specifically foster the use of the extreme or middle categories. A potential disadvantage of IRTrees may be seen in the dichotomization of responses in the definition of pseudo-items pertaining to agreement judgments or other subprocesses, which may reduce the precision of estimating the underling construct (but see Plieninger & Meiser, 2014). Recent development of IRTrees, however, allow for ordinal pseudo-items and for assessing the same construct over several binary pseudo-items (e.g. ...
Preprint
Chapter accepted for the International Encyclopedia of Education, 4th edition.
... In summary, the model assumes that responses to 5-point items can be explained by the target trait as well as the two response styles MRS and ERS. Often, an IR-tree model fits better than a unidimensional alternative, and this is taken as an indication that response styles are present in the data at hand (e.g., Böckenholt, 2012;Khorramdel & von Davier, 2014;Plieninger & Meiser, 2014). The IR-tree model then allows to measure response styles as well as the response-style free target trait. ...
... Response styles and other sources of method variance are receiving more and more attention in many areas (e.g., Baumgartner & Steenkamp, 2001;Kam & Meyer, 2015;Moors, 2012;Podsakoff, MacKenzie, Lee, & Podsakoff, 2003), and IR-tree models have already been successfully applied in this context. For example, Plieninger and Meiser (2014) predicted measures of academic performance using self-report measures that were controlled for response styles using an IR-tree model. Likewise, LaHuis et al. (2019) predicted job performance using variables such as self-reported teamwork or work ethic. ...
Article
Full-text available
IR-tree models assume that categorical item responses can best be explained by multiple response processes. In the present article, guidelines are provided for the development and interpretation of IR-tree models. In more detail, the relationship between a tree diagram, the model equations, and the analysis on the basis of pseudo-items is described. Moreover, it is shown that IR-tree models do not allow conclusions about the sequential order of the processes, and that mistakes in the model specification can have serious consequences. Furthermore, multiple-group IR-tree models are presented as a novel extension of IR-tree models to data from heterogeneous units. This allows, for example, to investigate differences across countries or organizations with respect to core parameters of the IR-tree model. Finally, an empirical example on organizational commitment and response styles is presented.
... Alternatively, item response models can measure response styles, and include, but are not limited to multiprocess models (e.g., Thissen-Roe and Thissen, 2013;Khorramdel and von Davier, 2014;Plieninger and Meiser, 2014;Böckenholt and Meiser, 2017), unfolding models (Liu and Wang, 2019), and the multidimensional nominal response model (MNRM; e.g., Bolt and Newton, 2011;Kieruj and Moors, 2013;Falk and Cai, 2016). Such approaches arguably rest upon testable assumptions and can handle some situations that sum scores cannot (e.g., planned missing data designs), and have numerous other advantages (e.g., conditional standard errors for score estimates). ...
... We present the MNRM with a concrete example that compares it to sum scores and provide examples and information to allow applied researchers to more widely use the model in practice for such validity investigations. We withhold arguing that the current approach is the best for modeling response styles, as there have recently emerged a number of alternatives (e.g., Plieninger and Meiser, 2014;Böckenholt and Meiser, 2017). However, the MNRM and the majority of similar latent trait models are most appropriate when there are multi-item measures of constructs, and typically assume what is known as a reflective measurement model (Bollen and Lennox, 1991). ...
Article
Full-text available
Recent years have seen a dramatic increase in item response models for measuring response styles on Likert-type items. These model-based approaches stand in contrast to traditional sum-score-based methods where researchers count the number of times that participants selected certain response options. The multidimensional nominal response model (MNRM) offers a flexible model-based approach that may be intuitive to those familiar with sum score approaches. This paper presents a tutorial on the model along with code for estimating it using three different software packages: flexMIRT®, mirt, and Mplus. We focus on specification and interpretation of response functions. In addition, we provide analytical details on how sum score to scale score conversion can be done with the MNRM. In the context of a real data example, three different scoring approaches are then compared. This example illustrates how sum-score-based approaches can sometimes yield scores that are confounded with substantive content. We expect that the current paper will facilitate further investigations as to whether different substantive conclusions are reached under alternative approaches to measuring response styles.
... To date, EIRM has been mostly applied to either dichotomous data or pseudo-dichotomous data where polytomous response categories have been collapsed into binary categories through the selective grouping of ordered or nominal response categories (e.g., Bulut, Palma, Rodriguez, & Stanke, 2015;De Boeck & Partchev, 2012;Plieninger & Meiser, 2014;Prowker & Camilli, 2007;Scheiblechner, 2009;Verhelst & Verstralen, 2008). Despite more recent attempts that described how to estimate explanatory IRT models for items with ordered or nominal response categories (e.g., Jiao & Zhang, 2014;Wang & Liu, 2007;Tuerlinckx & Wang, 2004), the proposed models have been limited in terms of utilizing a familiar polytomous IRT model (e.g., GRM, PCM, and RM) within the EIRM framework. ...
... However, some of these programs (e.g., HLM and SAS) are only commercially available and the others (e.g., WINBUGS) require a strong understanding of the Bayesian modeling. To avoid the problems described above, some researchers restructured polytomous response data into dichotomous response data and utilized free software programs that are capable of estimating GLMMs with dichotomous data (e.g., Bulut et al., 2015;De Boeck & Partchev, 2012;Plieninger & Meiser, 2014;Prowker & Camilli, 2007;Scheiblechner, 2009;Verhelst & Verstralen, 2008). However, changing the original structure of data often results in information loss and thus adds additional bias to the inferences made from the estimated models. ...
Article
Full-text available
Item response theory is a widely used framework for the design, scoring, and scaling of measurement instruments. Item response models are typically used for dichotomously scored questions that have only two score points (e.g., multiple-choice items). However, given the increasing use of instruments that include questions with multiple response categories, such as surveys, questionnaires, and psychological scales, polytomous item response models are becoming more utilized in education and psychology. This study aims to demonstrate the application of explanatory item response models to polytomous item responses in order to explain common variability in item clusters, person groups, and interactions between item clusters and person groups. Explanatory forms of several polytomous item response models – such as Partial Credit Model and Rating Scale Model – are demonstrated and the estimation procedures of these models are explained. Findings of this study suggest that explanatory item response models can be more robust and parsimonious than traditional item response models for polytomous data where items and persons share common characteristics. Explanatory polytomous item response models can provide more information about response patterns in item responses by estimating fewer item parameters.
... IRTree models describe the cognitive process of reaching a response category on a Likert scale on the basis of a tree-like structure. Most studies (e.g., Böckenholt, 2012Böckenholt, , 2017Jeon & De Boeck, 2016;Khorramdel & von Davier, 2014;Plieninger & Meiser, 2014) have applied a three-decision model composed of three steps: (1) indifference, (2) direction, and (3) intensity. During the indifference step, respondents decide whether or not to express their attitudes or to hold a neutral attitude toward a statement. ...
... A binary pseudo-item (BPI) is created at each step, and these BPIs are then examined with simple-structure multidimensional IRT (MIRT) models. It is acknowledged that IRT models do not assume the order of the three steps, and several sequences have been proposed in the literature (e.g., Böckenholt, 2017;Jeon & De Boeck, 2016;Plieninger & Meiser, 2014). ...
Article
Full-text available
Likert or rating scales may elicit an extreme response style (ERS), which means that responses to scales do not reflect the ability that is meant to be measured. Research has shown that the presence of ERS could lead to biased scores and thus influence the accuracy of differential item functioning (DIF) detection. In this study, a new method under the multiple-indicators multiple-causes (MIMIC) framework is proposed as a means to eliminate the impact of ERS in DIF detection. The findings from a series of simulations showed that a difference in ERS between groups caused inflated false-positive rates and deflated true-positive rates in DIF detection when ERS was not taken into account. The modified MIMIC model, as compared to conventional MIMIC, logistic discriminant function analysis, ordinal logistic regression, and their extensions, could control false-positive rates across situations and yielded trustworthy true-positive rates. An empirical example from a study of Chinese marital resilience was analyzed to demonstrate the proposed model.
... Such different usages of the response scale can systematically distort the estimation of individual substantive trait levels, group means, and correlations among multiple traits, so that RS must be controlled for to obtain valid measurements (Baumgartner & Steenkamp, 2001;Alwin, 2007). Commonly used IRTree models for the analysis of RS define agreement decisions as dependent on the substantive trait levels of respondents, whereas more fine-grained decisions are modeled to be based on individual RS, like the judgment to give extreme versus non-extreme responses guided by ERS, or the judgment to select the neutral middle category guided by MRS (e.g., Böckenholt, 2017;Khorramdel & von Davier, 2014;Plieninger & Meiser, 2014;Thissen-Roe & Thissen, 2013). However, even though IRTree models are mostly used for RS analysis, they are flexible to incorporate any kind of person-specific influences on the selection of individual response categories (e.g., socially desirable responding) by defining the pseudo-items correspondingly. ...
Article
Full-text available
Responding to rating scale items is a multidimensional process, since not only the substantive trait being measured but also additional personal characteristics can affect the respondents’ category choices. A flexible model class for analyzing such multidimensional responses are IRTree models, in which rating responses are decomposed into a sequence of sub-decisions. Different response processes can be involved in item responding both sequentially across those sub-decisions and as co-occurring processes within sub-decisions. In the previous literature, modeling co-occurring processes has been exclusively limited to dominance models, where higher trait levels are associated with higher expected scores. However, some response processes may rather follow an ideal point rationale, where the expected score depends on the proximity of a person’s trait level to the item’s location. Therefore, we propose a new multidimensional IRT model of co-occurring dominance and ideal point processes (DI-MIRT model) as a flexible framework for parameterizing IRTree sub-decisions with multiple dominance processes, multiple ideal point processes, and combinations of both. The DI-MIRT parameterization opens up new application areas for the IRTree model class and allows the specification of a wide range of theoretical assumptions regarding the cognitive processing of item responding. A simulation study shows that IRTree models with DI-MIRT parameterization provide excellent parameter recovery and accurately reflect co-occurring dominance and ideal point processes. In addition, a clear advantage over traditional IRTree models with purely sequential processes is demonstrated. Two application examples from the field of response style analysis highlight the benefits of the general IRTree framework under real-world conditions.
... The IRTree formulation can represent a wide variety of response formats and response processes, easily adapted for binary responses, one-dimensional scales, bipolar scales, and Likert responses [17]. IRTrees have been used in general applications such as differentiating types of intelligence [27], response styles in multiple-choice items [28], and modeling answer change behavior [29]. In the forensic science setting, IRTrees have shown to be useful for representing sequential decision-making processes when an answer key does not exist [19]. ...
Article
Full-text available
In recent years, ‘black box’ studies in forensic science have emerged as the preferred way to provide information about the overall validity of forensic disciplines in practice. These studies provide aggregated error rates over many examiners and comparisons, but errors are not equally likely on all comparisons. Furthermore, inconclusive responses are common and vary across examiners and comparisons, but do not fit neatly into the error rate framework. This work introduces Item Response Theory (IRT) and variants for the forensic setting to account for these two issues. In the IRT framework, participant proficiency and item difficulty are estimated directly from the responses, which accounts for the different subsets of items that participants often answer. By incorporating a decision-tree framework into the model, inconclusive responses are treated as a distinct cognitive process, which allows inter-examiner differences to be estimated directly. The IRT-based model achieves superior predictive performance over standard logistic regression techniques, produces item effects that are consistent with common sense and prior work, and demonstrates that most of the variability among fingerprint examiner decisions occurs at the latent print evaluation stage and as a result of differing tendencies to make inconclusive decisions.
... Finally, we point out that validation studies are urgently needed for ensuring that the substantive interpretations of the model parameters hold true. Validity could be investigated with experimental manipulations, for example, by varying instructions (as in Bowling et al., 2020;Niessen et al., 2016), through investigations of the model's capability to detect differences between groups of respondents that can be assumed to differ in their levels of C/IER and/or their stylistic tendencies in attentive responding (see Ulitzsch, Penk, et al., 2021, for a validation study using such group comparisons to gain validity evidence for a modelbased approach to rapid guessing behavior), or by investigating how attentive RS and adjusted content traits relate to external variables, assuming that relationships adjusted for response bias should more strongly align with subject-matter theory than their unadjusted counterparts (Khorramdel et al., 2017), and that attentive RS and content traits should be linked selectively to extraneous criteria of attentive RS and content traits (Plieninger & Meiser, 2014). ...
Article
Full-text available
Questionnaires are by far the most common tool for measuring noncognitive constructs in psychology and educational sciences. Response bias may pose an additional source of variation between respondents that threatens validity of conclusions drawn from questionnaire data. We present a mixture modeling approach that leverages response time data from computer-administered questionnaires for the joint identification and modeling of two commonly encountered response bias that, so far, have only been modeled separately—careless and insufficient effort responding and response styles (RS) in attentive answering. Using empirical data from the Programme for International Student Assessment 2015 background questionnaire and the case of extreme RS as an example, we illustrate how the proposed approach supports gaining a more nuanced understanding of response behavior as well as how neglecting either type of response bias may impact conclusions on respondents’ content trait levels as well as on their displayed response behavior. We further contrast the proposed approach against a more heuristic two-step procedure that first eliminates presumed careless respondents from the data and subsequently applies model-based approaches accommodating RS. To investigate the trustworthiness of results obtained in the empirical application, we conduct a parameter recovery study.
... Models for binary-coded data (i.e., testing data coded for accuracy; Debeer et al., 2017), as well as a range of polytomous scales (Jeon and De Boeck, 2016;Dibek, 2019;Ames, 2021;Spratto et al., 2021) have been developed, some of which are structurally distinct from the MPP decision hierarchy (e.g., Forthmann et al., 2019). Even the MPP model has been adapted to scales with varying response option lengths, altering the number steps in the decision hierarchy to accommodate for the number of response choices available (Plieninger and Meiser, 2014). Given the issues with IRTree model fit raised here naturally arise from the interaction of the model's structure with the format of the scale, additional investigation should be conducted on how these elements could impact any given model's ability to recover item parameters and recreate data. ...
Article
Full-text available
Item response tree (IRTree) models are theorized to extract response styles from self-report data by utilizing multidimensional item response theory (IRT) models based on theoretical decision processes. Despite the growing popularity of the IRTree framework, there has been little research that has systematically examined the ability of its most popular models to recover item parameters across sample size and test length. This Monte Carlo simulation study explored the ability of IRTree models to recover item parameters based on data created from the midpoint primary process model. Results indicate the IRTree model can adequately recover item parameters early in the decision process model, specifically the midpoint node. However, as the model progresses through the decision hierarchy, item parameters have increased associated error variance. The authors ultimately recommend caution when employing the IRTree framework.
... By assigning different latent traits to the pseudo-items, their effects on response selection can be separated. Typically, one pseudo-item represents the decision to agree vs. disagree with the item content, which is supposed to be made based on the substantive trait, whereas all further pseudo-items relate to RS-based responding, like the judgment to give extreme vs. non-extreme responses guided by ERS (e.g., Böckenholt, 2017;Khorramdel & von Davier, 2014;Plieninger & Meiser, 2014;Zettler et al., 2016). ...
Article
Full-text available
It is essential to control self-reported trait measurements for response style effects to ensure a valid interpretation of estimates. Traditional psychometric models facilitating such control consider item responses as the result of two kinds of response processes—based on the substantive trait, or based on response styles—and they assume that both of these processes have a constant influence across the items of a questionnaire. However, this homogeneity over items is not always given, for instance, if the respondents’ motivation declines throughout the questionnaire so that heuristic responding driven by response styles may gradually take over from cognitively effortful trait-based responding. The present study proposes two dynamic IRTree models, which account for systematic continuous changes and additional random fluctuations of response strategies, by defining item position-dependent trait and response style effects. Simulation analyses demonstrate that the proposed models accurately capture dynamic trajectories of response processes, as well as reliably detect the absence of dynamics, that is, identify constant response strategies. The continuous version of the dynamic model formalizes the underlying response strategies in a parsimonious way and is highly suitable as a cognitive model for investigating response strategy changes over items. The extended model with random fluctuations of strategies can adapt more closely to the item-specific effects of different response processes and thus is a well-fitting model with high flexibility. By using an empirical data set, the benefits of the proposed dynamic approaches over traditional IRTree models are illustrated under realistic conditions.
... While, Peterson and Harrell (1990) have submitted how different variables of category-specific or global effects can be investigated, the interpretation of the effects in the generalized ordered logit models in the social sciences have been offered by Williams (2016) and, Hedeker and Mermelstein (1998). In terms of symmetric binary split models, item response theory (response trees) whose focus is on measuring latent traits, rather than the impact of explanatory variables nor inclusion of covariates as several considerations have been suggested (Plieninger & Meiser, 2014;Khorramdel & von Davier, 2014;De Boeck & Partchev, 2012;Böckenholt & Meiser, 2017;Böckenholt, 2017;Meiser et al., 2019). Furthermore, in each split in item response trees there is specification of new person and trait parameters, the consequence of which is that most response styles are modelled while higher categories tendency gets lost. ...
Article
Full-text available
Tricycles constitute a major component of the informal transport system in many Nigerian cities. Their operation is fast becoming a parody of urban living and as such, requires urgent regulatory attention. The paper examined the perception of comfort that passengers derive from the use of tricycles for urban mobility in Calabar metropolis, Nigeria. The study modeled different comfort levels based on demographic characteristics of the passengers, as derived from its use. Questionnaire and data from CRDoPT and other published materials were used to make inferences on the study. Analyses were by descriptive and inferential statistics. Ordinal regression analysis using the PLUM method in SPSS analyzed data where the null hypothesis that there is no statistical relationship between passenger demographic parameters and the feeling of comfort from the use of the tricycle in Calabar, was rejected. The feeling of comfort of the tricycle mode users is observed to be dependent (p < .05) on the education (p = .000); monthly income (p = .001); occupation (p = .046); and marital status (p = .003) on the respondents. The study recommended among other things, regulatory policies that are expected to enhance an effective modal regulation that promotes a safe and comfortable use of the mode. Registration and creation of an accurate database of the tricycle operators and operations in order to formalize tricycle operation can enhance the confidence of use and boost comfort levels of passengers.
... Given that IR-tree models treat latent traits associated with each decision-making stage as continuous and quantitative variables, they are a good methodological tool for examining the characteristics and patterns of personality faking in each response phase (Böckenholt & Meiser, 2017). Additionally, in the IR-tree models, the latent traits involved in each decision-making step can be combined with covariates (e.g., social desirability), allowing researchers to evaluate the differential impact of covariates on the latent traits involved in each decision-making step (e.g., Böckenholt, 2012Böckenholt, , 2019De Boeck & Partchev, 2012;Plieninger & Meiser, 2014;Zettler et al., 2016). ...
Article
Full-text available
The item response tree (IR-tree) model is increasingly used in the field of personality assessments. The IR-tree model allows researchers to examine the cognitive decision-making process using a tree structure and evaluate conceptual ideas in accounting for individual differences in the response process. Recent research has shown the feasibility of applying IR-tree models to personality data; however, these studies have been exclusively focused on an honest or incumbent sample rather than a motivated sample in a high-stakes situation. The goal of our research is to elucidate the internal mechanism behind how respondents in different testing situations (i.e., honest and motivated test conditions) experience decision-making processes through the three-process IR-tree model (Böckenholt, 2012). Our findings generally corroborate the response mechanism of the direction–extremity–midpoint sequence in both honest and motivated test settings. Additionally, samples in motivated test settings generally exhibit stronger directional and extreme response preferences but weaker preferences of midpoint selection than samples in unmotivated test settings. Furthermore, for actual job applicants, social desirability had a substantial effect on all directional, extreme, and midpoint response preferences. Our findings will aid researchers and practitioners in developing a nuanced understanding of the decision-making process of test-takers in motivated test environments. Furthermore, this research will help researchers and practitioners develop more fake-resistant personality assessments.
... Ames and Leventhal (2021) demonstrated accurate estimation of the shift parameter. Plieninger and Meiser (2014) report on convergent evidence of construct and criterion validity of the IRTree model. Taken together, the validity evidence for use of the IRTree is growing, but we recognize the need for more studies investigating the LIRTree. ...
Article
Full-text available
Traditional psychometric models focus on studying observed categorical item responses, but these models often oversimplify the respondent cognitive response process, assuming responses are driven by a single substantive trait. A further weakness is that analysis of ordinal responses has been primarily limited to a single substantive trait at one time point. This study applies a significant expansion of this modeling framework to account for complex response processes across multiple waves of data collection using the item response tree (IRTree) framework. This study applies a novel model, the longitudinal IRTree, for response processes in longitudinal studies, and investigates whether the response style changes are proportional to changes in the substantive trait of interest. To do so, we present an empirical example using a six-item sexual knowledge scale from the National Longitudinal Study of Adolescent to Adult Health across two waves of data collection. Results show an increase in sexual knowledge from the first wave to the second wave and a decrease in midpoint and extreme response styles. Model validation revealed failure to account for response style can bias estimation of substantive trait growth. The longitudinal IRTree model captures midpoint and extreme response style, as well as the trait of interest, at both waves.
... Jeon and De Boeck (2016) showed that a generalized item response tree model with a flexible parametric form, dimensionality, and choice of covariates can be useful for investigating underlying decision processes, describing distinctive features of item response categories, and investigating multiple sources of heterogeneity in a response scale. As Plieninger and Meiser (2014) demonstrated, item content and response styles have different relationships to external variables, the LIRP-TM does not presume that the decision making processes relate to the item content (obviously, the outcome of the process is related to the item content). Therefore, if individual differences in any one of the latent processes exist independently from item content, respondents should show similar tendencies across different items or scales. ...
Article
Full-text available
With the increasing popularity of non-cognitive inventories in personnel selection, organizations typically wish to be able to tell when a job applicant purposefully manufactures a favorable impression. Past faking research has primarily focused on how to reduce faking via instrument design, warnings, and statistical corrections for faking. This paper took a new approach by examining the effects of faking (experimentally manipulated and contextually driven) on response processes. We modified a recently introduced item response theory tree modeling procedure, the three-process model (Böckenholt, 2013), to identify faking in two studies. Study 1 examined self-reported vocational interest assessment responses using an induced faking experimental design. Study 2 examined self-reported personality assessment responses when some people were in a high-stakes situation (i.e., selection). Across the two studies, individuals instructed or expected to fake were found to engage in more extreme responding. By identifying the underlying differences between fakers and honest respondents, the new approach improves our understanding of faking. Percentage cut-offs based on extreme responding produced a faker classification precision of 85% on average.
... The most parsimonious model is obtained if one assumes β (S) = β for all S and specifies the binary models such that S 1 contains the higher categories. Symmetric binary split models have been considered extensively in item response theory under the name item response trees (Böckenholt, 2017;Böckenholt & Meiser, 2017;De Boeck & Partchev, 2012;Khorramdel & von Davier, 2014;Meiser et al., 2019;Plieninger & Meiser, 2014). However, in item response theory the focus is on measuring latent traits, not the impact of explanatory variables. ...
Article
Full-text available
Ordinal models can be seen as being composed from simpler, in particular binary models. This view on ordinal models allows to derive a taxonomy of models that includes basic ordinal regression models, models with more complex parameterizations, the class of hierarchically structured models, and the more recently developed finite mixture models. The structured overview that is given covers existing models and shows how models can be extended to account for further effects of explanatory variables. Particular attention is given to the modeling of additional heterogeneity as, for example, dispersion effects. The modeling is embedded into the framework of response styles and the exact meaning of heterogeneity terms in ordinal models is investigated. It is shown that the meaning of terms is crucially determined by the type of model that is used. Moreover, it is demonstrated how models with a complex category‐specific effect structure can be simplified to obtain simpler models that fit sufficiently well. The fitting of models is illustrated by use of a real data set, and a short overview of existing software is given. This article is categorized under: • Statistical Models > Fitting Models • Data: Types and Structure > Categorical Data • Statistical Models > Generalized Linear Models Abstract Taxonomy of ordinal models.
... Research into the sources of construct-irrelevant variance has seen growth in the area of individuals' response style, which is the tendency of a respondent to use a response scale in a systematic way, regardless of the content of the items and survey (Paulhus, 1991;Plieninger & Meiser, 2014). Baumgartner and Steenkamp (2001) recognized and summarized seven important response styles. ...
Article
Contamination of responses due to extreme and midpoint response style can confound the interpretation of scores, threatening the validity of inferences made from survey responses. This study incorporated person-level covariates in the multidimensional item response tree model to explain heterogeneity in response style. We include an empirical example and two simulation studies to support the use and interpretation of the model: parameter recovery using Markov chain Monte Carlo (MCMC) estimation and performance of the model under conditions with and without response styles present. Item intercepts mean bias and root mean square error were small at all sample sizes. Item discrimination mean bias and root mean square error were also small but tended to be smaller when covariates were unrelated to, or had a weak relationship with, the latent traits. Item and regression parameters are estimated with sufficient accuracy when sample sizes are greater than approximately 1,000 and MCMC estimation with the Gibbs sampler is used. The empirical example uses the National Longitudinal Study of Adolescent to Adult Health’s sexual knowledge scale. Meaningful predictors associated with high levels of extreme response latent trait included being non-White, being male, and having high levels of parental support and relationships. Meaningful predictors associated with high levels of the midpoint response latent trait included having low levels of parental support and relationships. Item-level covariates indicate the response style pseudo-items were less easy to endorse for self-oriented items, whereas the trait of interest pseudo-items were easier to endorse for self-oriented items.
... IRTree models may fill a needed gap with respect to evaluating performance assessment raters' scoring processes. IRTree models can provide researchers and practitioners insight into raters' cognitive scoring processes by disentangling polytomous item scores into scores representing multiple decisionmaking sub-processes (Jeon & De Boeck, 2016;Plieninger & Meiser, 2014). Previous applications of IRTree models have explored multidimensionality in item responses due to individuals' response processes when responding to Likert-type scales (e.g., Böckenholt, 2017;Stone & Leventhal, 2016;Thissen-Roe & Thissen, 2013). ...
Article
When rating performance assessments, raters may ascribe different scores for the same performance when rubric application does not align with the intended application of the scoring criteria. Given performance assessment score interpretation assumes raters apply rubrics as rubric developers intended, misalignment between raters’ scoring processes and the intended scoring processes may lead to invalid inferences from these scores. In an effort to standardize raters’ scoring processes, an alternative scoring method was used. With this method, rubric developers’ intended scoring processes are made explicit by requiring raters to respond to a series of selected-response statements resembling a decision tree. To determine if raters scored essays as intended using a traditional rubric and the alternative scoring method, an IRT model with a tree-like structure (IRTree) was specified to depict the intended scoring processes and fit to data from each scoring method. Results suggest raters using the alternative method may be better able to rate as intended and thus the alternative method may be a viable alternative to traditional rubric scoring. Implications of the IRTree model are discussed.
... Many psychometric modeling approaches have been proposed in order to measure and control for response styles in ordinal rating data. Response styles have been accommodated in various types of Item Response Theory (IRT) models such as extensions of Divide-by-Total models (e.g., Bolt & Johnson, 2009;Falk & Cai, 2016;Rost, 1991;Wang, Wilson, & Shih, 2006;Wetzel & Carstensen, 2017), the Graded Response Model (GRM, e.g., Ferrando, 2014;Lubbe & Schuster, 2017;Rossi, Gilula, & Allenby, 2001;Thissen-Roe & Thissen, 2013), and IRTree models that characterize responses to a rating scale item by a sequence of a priori defined multiple processes (Böckenholt, 2012;De Boeck & Partchev, 2012;Khorramdel & von Davier, 2014;Plieninger & Meiser, 2014). The psychometric models differ in the degree of a priori assumptions on response styles that they incorporate. ...
Article
Full-text available
A large variety of item response theory (IRT) modeling approaches aim at measuring and correcting for response styles in rating data. Here, we integrate response style models of the divide-by-total model family into one superordinate framework that parameterizes response styles as person-specific shifts in threshold parameters. This superordinate framework allows us to structure and compare existing approaches to modeling response styles and therewith makes model-implied restrictions explicit. With a simulation study, we show how the new framework allows us to assess consequences of violations of model assumptions and to compare response style estimates across different model parameterizations. The integrative framework of divide-by-total modeling approaches facilitates the correction for and examination of response styles. In addition to providing a superordinate framework for psychometric research, it gives guidance to applied researchers for model selection and specification in psychological assessment.
... Recently, different multi-processing IRT models or IRTree models have been proposed as a method for measuring response styles (B€ ockenholt, 2012, 2017B€ ockenholt & Meiser, 2017;Jeon & de Boeck, 2016;Khorramdel & von Davier, 2014;Khorramdel et al., 2017;Plieninger & Meiser, 2014). The general idea of IRTree approaches is to decompose categorical responses from rating scales into several sequential response sub-processes or pseudo-items (which are binary or polytomous). ...
... The IRT approach applied in the current study is a multidimensional extension (Khorramdel & von Davier, 2014;von Davier & Khorramdel, 2013) of an approach proposed by Böckenholt (2012). The approach was successfully applied to empirical data (Khorramdel & von Davier, 2014;von Davier & Khorramdel, 2013) and validated using extraneous RS criteria, academic grades, and the relationship between self-concept and reading performance to prove its usefulness (Plieninger & Meiser, 2014). Likert type rating data are decomposed into multiple response subprocesses (binary pseudo items; BPIs) that can be used to separate RS from construct-related responses by applying simple-structure MIRT models. ...
Article
Full-text available
A relatively new item response theory (IRT) approach (Böckenholt, 2012) and its multidimensional extension (Khorramdel & von Davier, 2014; von Davier & Khorramdel, 2013) to test and correct for response styles was applied to international large-scale assessment data – the Programme for International Student Assessment 2012 field trial – for the first time. The responses of n = 17,552 students at age 15 from 63 different countries to the two personality scales of openness and perseverance (student questionnaire) were examined, and bias from an extreme response style (ERS) and midpoint response style (MRS) was found. The aim of the study is not to report country level results but to look at the potential of this methodology to test for and correct response style bias in an international context. It is shown that personality scales corrected for response styles can lead to more valid test scores, addressing the “paradoxical relationship” phenomenon of negative correlations between personality scales and cognitive proficiencies. ERS correlates negatively with the cognitive domains of mathematics and problem solving on the country mean level, while MRS shows positive correlations.
... We refer to these response processes visualized in Fig. 2 as the "Care&Yea" model. The effect of an indifference category in Likert data has been shown in previous work that utilized item response tree models (Böckenholt, 2017;Jeon and De Boeck, 2016;Khorramdel and von Davier, 2014;Plieninger and Meiser, 2014;Zettler et al., 2016). In this research, the selection of the middle category is attributed to the response style of a person. ...
Article
This paper presents a systematic investigation of how affirmative and polar-opposite items presented either jointly or separately affect yea-saying tendencies. We measure these yea-saying tendencies with item response models that estimate a respondent’s tendency to give a “yea”-response that may be unrelated to the target trait. In a re-analysis of the Zhang et al. (PLoS ONE, 11:1–15, 2016) data, we find that yea-saying tendencies depend on whether items are presented as part of a scale that contains affirmative and/or polar-opposite items. Yea-saying tendencies are stronger for affirmative than for polar-opposite items. Moreover, presenting polar-opposite items together with affirmative items creates lower yea-saying tendencies for polar-opposite items than when presented in isolation. IRT models that do not account for these yea-saying effects arrive at a two-dimensional representation of the target trait. These findings demonstrate that the contextual information provided by an item scale can serve as a determinant of differential item functioning.
... The first approach, which will not be considered further in this article, uses a dichotomous conceptualization of ERS. Specifically, a dichotomous indicator variable, indicating whether an extreme response category was selected or not, is used to measure ERS (e.g., Böckenholt, 2012;Khorramdel & von Davier, 2014;Plieninger & Meiser, 2014). The second approach, which is considered in detail in the following, models ERS by modifying the spacing between responses for each responder individually. ...
Article
Extreme response style is the tendency of individuals to prefer the extreme categories of a rating scale irrespective of item content. It has been shown repeatedly that individual response style differences affect the reliability and validity of item responses and should, therefore, be considered carefully. To account for extreme response style (ERS) in ordered categorical item responses, it has been proposed to model responder-specific sets of category thresholds in connection with established polytomous item response models. An elegant approach to achieve this is to introduce a responder-specific scaling factor that modifies intervals between thresholds. By individually expanding or contracting intervals between thresholds, preferences for selecting either the outer or inner response categories can be modeled. However, for a responder-specific scaling factor to appropriately account for ERS, there are two important aspects that have not been considered previously and which, if ignored, will lead to questionable model properties. Specifically, the centering of threshold parameters and the type of category probability logit need to be considered carefully. In the present article, a scaled threshold model is proposed, which accounts for these considerations. Instructions on model fitting are given together with SAS PROC NLMIXED program code, and the model’s application and interpretation is demonstrated using simulation studies and two empirical examples.
... Especially in the item response theory (IRT) framework, B¨ockenholt and Meiser (2017) reviewed two types of IRT models designed to handle response styles: threshold-based models such as polytomous Rasch models and their mixture extensions (von Davier & Yamamoto, 2007;Rost, 1991), and an item response (IR) tree model (B€ ockenholt, 2012, 2017), which can be used to distinguish the effects of the judgement processes associated with content and response style. Plieninger and Meiser (2014) also validated several IR tree methods using an empirical data set. In other IRT-related research involving response styles, IRT and mixture IRT models have further been applied to correct for response style by adjusting parameters representing the response styles (Austin et al., 2006;Bolt & Johnson, 2009;Meiser & Machunsky, 2008;Morren, Gelissen, & Vermunt, 2012). ...
Article
Preference data, such as Likert scale data, are often obtained in questionnaire‐based surveys. Clustering respondents based on survey items is useful for discovering latent structures. However, cluster analysis of preference data may be affected by response styles, that is, a respondent's systematic response tendencies irrespective of the item content. For example, some respondents may tend to select ratings at the ends of the scale, which is called an ‘extreme response style’. A cluster of respondents with an extreme response style can be mistakenly identified as a content‐based cluster. To address this problem, we propose a novel method of clustering respondents based on their indicated preferences for a set of items while correcting for response‐style bias. We first introduce a new framework to detect, and correct for, response styles by generalizing the definition of response styles used in constrained dual scaling. We then simultaneously correct for response styles and perform a cluster analysis based on the corrected preference data. A simulation study shows that the proposed method yields better clustering accuracy than the existing methods do. We apply the method to empirical data from four different countries concerning social values.
... A significant amount of research has shown that item response (IR) tree models for Likerttype scales are useful in identifying two response-style behaviours that reflect the degree to which persons select the midpoint and the extreme categories of a response scale regardless of their target-trait value (B€ ockenholt, 2012;DeBoeck & Partchev, 2012;Khorramdel & von Davier, 2014;Plieninger & Meiser, 2014). This research also demonstrated the significant biasing influence of these scale-usage behaviours on factor-analytic representations of psychological scales. ...
Article
Recent applications of item response tree models demonstrate that this model class is well suited to detect midpoint and extremity response style effects in both attitudinal and personality measurements. This paper proposes an extension of this approach that goes beyond measuring response styles and allows us to examine item‐feature effects. In a reanalysis of three published data sets, it is shown that the proposed extension captures item‐feature effects across affirmative and reverse‐worded items in a psychological test. These effects are found to affect directional responses but not midpoint and extremity preferences. Moreover, accounting for item‐feature effects substantially improves model fit and interpretation of the construct measurement. The proposed extension can be implemented readily with current software programs that facilitate maximum likelihood estimation of item response models with missing data.
... In tree-based methods one assumes a nested structure, first a decision about the direction of the response is modelled and then the strength. Models of this type have been considered by Suh and Bolt (2010), De Boeck and Partchev (2012), Thissen-Roe and Thissen (2013), Jeon and De Boeck (2016), Böckenholt (2012), Khorramdel and von Davier (2014), Plieninger and Meiser (2014), Böckenholt (2017) and Böckenholt and Meiser (2017). ...
Article
Increasing attention has been recently paid to assessing individuals’ financial competence. Financial knowledge appears as a complex and not directly observable phenomenon, whose measurement is usually based on answers to a set of multiple-choice items. The option ‘Don’t Know’ (DK) is usually included among the possible answers to capture uncertainty or lack of knowledge. Its presence represents an element of noise that can affect the measurement of financial knowledge. DK responses are usually considered as incorrect answers or missing values; however, these naive approaches may lead to biased financial knowledge measures. In this study, we address the issue of estimating the latent knowledge construct accounting for the DK option, through a bidimensional latent dregression two-parameter logistic model. The model at issue relies on the assumption that the response process may be disentangled in two consecutive steps driven by two latent variables: propensity to provide a substantive answer and financial knowledge. In the first step, both latent variables affect the probability of providing a substantive response. In the second step, conditionally on the selection of a substantive response, financial knowledge affects the probability of a correct answer vs. an incorrect one. Individual characteristics are also considered to explain the two latent traits.
Article
Item response tree (IRTree) models form a family of psychometric models that allow researchers to control for multiple response processes, such as different sorts of response styles, in the measurement of latent traits. While IRTree models can capture quantitative individual differences in both the latent traits of interest and the use of response categories, they maintain the basic assumption that the nature and weighting of latent response processes are homogeneous across the entire population of respondents. In the present research, we therefore propose a novel approach for detecting heterogeneity in the parameters of IRTree models across subgroups that engage in different response behavior. The approach uses score‐based tests to reveal violations of parameter heterogeneity along extraneous person covariates, and it can be employed as a model‐based partitioning algorithm to identify sources of differences in the strength of trait‐based responding or other response processes. Simulation studies demonstrate generally accurate Type I error rates and sufficient power for metric, ordinal, and categorical person covariates and for different types of test statistics, with the potential to differentiate between different types of parameter heterogeneity. An empirical application illustrates the use of score‐based partitioning in the analysis of latent response processes with real data.
Article
Item response tree (IRTree) models are a flexible framework to control self-reported trait measurements for response styles. To this end, IRTree models decompose the responses to rating items into sub-decisions, which are assumed to be made on the basis of either the trait being measured or a response style, whereby the effects of such person parameters can be separated from each other. Here we investigate conditions under which the substantive meanings of estimated extreme response style parameters are potentially invalid and do not correspond to the meanings attributed to them, that is, content-unrelated category preferences. Rather, the response style factor may mimic the trait and capture part of the trait-induced variance in item responding, thus impairing the meaningful separation of the person parameters. Such a mimicry effect is manifested in a biased estimation of the covariance of response style and trait, as well as in an overestimation of the response style variance. Both can lead to severely misleading conclusions drawn from IRTree analyses. A series of simulation studies reveals that mimicry effects depend on the distribution of observed responses and that the estimation biases are stronger the more asymmetrically the responses are distributed across the rating scale. It is further demonstrated that extending the commonly used IRTree model with unidimensional sub-decisions by multidimensional parameterizations counteracts mimicry effects and facilitates the meaningful separation of parameters. An empirical example of the Program for International Student Assessment (PISA) background questionnaire illustrates the threat of mimicry effects in real data. The implications of applying IRTree models for empirical research questions are discussed.
Article
Item response tree (IRTree) approaches have received increasing attention in the response style literature due to their capability to partial out response style latent traits from content-related latent traits by considering separate decisions for agreement and level of agreement. Additionally, it has shown that the functioning of the intensity of agreement decision may depend upon the agreement decision with an item, so that the item parameters and person parameters may differ by direction of agreement; when the parameters across direction are the same, this is called directional invariance. Furthermore, for non-cognitive psychological constructs, it has been argued that the response process may be best described as following an unfolding process. In this study, a family of IRTree models to handle unfolding responses with the agreement decision following the hyperbolic cosine model and the intensity of agreement decision following a graded response model is investigated. This model family also allows for investigation of item- and person-level directional invariance. A simulation study is conducted to evaluate parameter recovery; model parameters are estimated with a fully Bayesian approach using JAGS (Just Another Gibbs Sampler). The proposed modeling scheme is demonstrated with two data examples with multiple model comparisons allowing for varying levels of directional invariance and unfolding versus dominance processes. An approach to visualizing the final model item response functioning is also developed. The article closes with a short discussion about the results.
Article
In this study, we introduced a cross‐classified multidimensional nominal response model (CC‐MNRM) to account for various response styles (RS) in the presence of cross‐classified data. The proposed model allows slopes to vary across items and can explore impacts of observed covariates on latent constructs. We applied a recently developed variant of the Metropolis‐Hastings Robbins‐Monro (MH‐RM) algorithm to address the computational challenge of estimating the proposed model. To demonstrate our new approach, we analyzed empirical student evaluation of teaching (SET) data collected from a large public university with three models: a CC‐MNRM with RS, a CC‐MNRM with no RS, and a multilevel MNRM with RS. Results indicated that the three models led to different inferences regarding the observed covariates. Additionally, in the example, ignoring/incorporating RS led to changes in student substantive scores, while the instructor substantive scores were less impacted. Misspecifying the cross‐classified data structure resulted in apparent changes on instructor scores. To further evaluate the proposed modeling approach, we conducted a preliminary simulation study and observed good parameter and score recovery. We concluded this study with discussions of limitations and future research directions.
Article
Full-text available
Ambulatory assessment (AA) is becoming an increasingly popular research method in the fields of psychology and life science. Nevertheless, knowledge about the effects that design choices, such as questionnaire length (i.e., number of items per questionnaire), have on AA data quality is still surprisingly restricted. Additionally, response styles (RS), which threaten data quality, have hardly been analyzed in the context of AA. The aim of the current research was to experimentally manipulate questionnaire length and investigate the association between questionnaire length and RS in an AA study. We expected that the group with the longer (82-item) questionnaire would show greater reliance on RS relative to the substantive traits than the group with the shorter (33-item) questionnaire. Students (n = 284) received questionnaires three times a day for 14 days. We used a multigroup two-dimensional item response tree model in a multilevel structural equation modeling framework to estimate midpoint and extreme RS in our AA study. We found that the long questionnaire group showed a greater reliance on RS relative to trait-based processes than the short questionnaire group. Although further validation of our findings is necessary, we hope that researchers consider our findings when planning an AA study in the future.
Article
The use of discrete categorical formats to assess psychological traits has a long-standing tradition that is deeply embedded in item response theory models. The increasing prevalence and endorsement of computer- or web-based testing has led to greater focus on continuous response formats, which offer numerous advantages in both respondent experience and methodological considerations. Response styles, which are frequently observed in self-reported data, reflect a propensity to answer questionnaire items in a consistent manner, regardless of the item content. These response styles have been identified as causes of skewed scale scores and biased trait inferences. In this study, we investigate the impact of response styles on individuals’ responses within a continuous scale context, with a specific emphasis on extreme response style (ERS) and acquiescence response style (ARS). Building upon the established continuous response model (CRM), we propose extensions known as the CRM-ERS and CRM-ARS. These extensions are employed to quantitatively capture individual variations in these distinct response styles. The effectiveness of the proposed models was evaluated through a series of simulation studies. Bayesian methods were employed to effectively calibrate the model parameters. The results demonstrate that both models achieve satisfactory parameter recovery. Neglecting the effects of response styles led to biased estimation, underscoring the importance of accounting for these effects. Moreover, the estimation accuracy improved with increasing test length and sample size. An empirical analysis is presented to elucidate the practical applications and implications of the proposed models.
Article
Full-text available
Historically, the “ ? ” response category (i.e., the question mark response category) has been criticized because of the ambiguity of its interpretation. Previous empirical studies of the appropriateness of the “ ? ” response category have generally used methods that cannot disentangle the response style from target psychological traits and have also exclusively focused on Western samples. To further develop our understanding of the “ ? ” response category, we examined the differing use of the “ ? ” response category in the Job Descriptive Index (JDI) between U.S. and Korean samples by using the recently proposed item response tree (IRTree) models. Our research showed that the Korean group more strongly prefers the “ ? ” response category, while the U.S. group more strongly prefers the directional response category (i.e., Yes). In addition, the Korean group tended to interpret the “ ? ” response category as mild agreement, while the U.S. group tended to interpret it as mild disagreement. Our study adds to the scientific body of knowledge on the “ ? ” response category in a cross‐cultural context. We hope that our findings presented herein provide valuable insights for researchers and practitioners who want to better understand the “ ? ” response category and develop various psychological assessments in cross‐cultural settings.
Article
Response styles introduce construct-irrelevant variance as a result of respondents systematically responding to Likert-type items regardless of content. Methods to account for response styles through data analysis as well as approaches to mitigating the effects of response styles during data collection have been well-documented. Recent approaches to modeling Likert responses, such as the IRTree model, rely on the response process individuals take when answering item responses. In this study, we advocate for the use of IRTrees to analyze Likert items in addition to using the hypothesized response process to design new items. Combining these two approaches facilitates answering Likert item design questions that have plagued researchers. These include the interpretation of a middle response option, the optimal number of response options, and how to label the response options. We present 7 research questions that could be answered using this new approach, outline methods of data collection and analysis for each, and present results from an empirical example to address one of these seven questions.
Article
Self-report personality measures typically seek to capture overall tendencies of individuals' behavior but do not capture potentially useful information provided by within-person variability and differences in trait relevance. We propose an item-response theory approach to simultaneously capture estimates of trait levels, within-person variability, and differences in trait relevance. In contrast to previous research, we focus on intentional within-person shifts in personality that we label adaptability. This approach was tested on a large sample (N = 983) using Amazon's Mechanical Turk. Results suggested that these three aspects can be statistically distinguished from each other and provide incremental variance in related outcomes. The ability to simultaneously capture personality trait levels, adaptability, and traitedness with self-report measures would allow researchers to better understand the role of personality in the workplace.
Article
The aim of the current research was to provide recommendations to facilitate the development and use of anchoring vignettes (AVs) for cross-cultural comparisons in education. Study 1 identified six factors leading to order violations and ties in AV responses based on cognitive interviews with 15-year-old students. The factors were categorized into three domains: varying levels of AV format familiarity, differential interpretations of content, and individual differences in processing information. To inform the most appropriate approach to treat order violations and re-scaling method, Study 2 conducted Monte Carlo simulations with the manipulation of three factors and incorporation of five responses styles. Study 2 found that the AV approach improved accuracy in score estimation by successfully controlling for response styles. The results also revealed the reordering approach to treat order violations produced the most accurate estimation combined with the re-scaling method assigning the middle value among possible scores. Along with strategies to develop AVs, strengths and limitations of the implemented nonparametric AV approach were discussed in comparison to item response theory modeling for response styles.
Article
Individual response style behaviors, unrelated to the latent trait of interest, may influence responses to ordinal survey items. Response style can introduce bias in the total score with respect to the trait of interest, threatening valid interpretation of scores. Despite claims of response style stability across scales, there has been little research into stability across multiple scales from the beneficial perspective of item response trees. This study examines an extension of the IRTree methodology to include mixed item formats, providing an empirical example of responses to three scales measuring perceptions of social media, climate change, and medical marijuana use. Results show extreme and midpoint response styles were not stable across scales within a single administration and 5-point Likert-type items elicited higher levels of extreme response style than the 4-point items. Latent trait of interest estimation varied, particularly at the lower end of the score distribution, across response style models, demonstrating as appropriate response style model is important for adequate trait estimation using Bayesian Markov chain Monte Carlo estimation.
Article
In this study, we examined the results and interpretations produced from two different IRTree models—one using paths consisting of only dichotomous decisions, and one using paths consisting of both dichotomous and polytomous decisions. We used data from two versions of an impulsivity measure. In the first version, all the response options had labels; in the second version, only the endpoints were labeled. Based on past research, we hypothesized that the endpoints would be selected more frequently in the endpoint-only labeled condition, and the midpoint response option would be selected more frequently in the fully labeled condition. Results from the two models (dichotomous and polytomous) were similar and indicated that our hypotheses were partially supported—specifically, there was no consistent pattern in terms of which condition saw a higher frequency of midpoint response selection. However, our hypotheses regarding extreme responding in the endpoint-only labeling condition were supported.
Article
This paper presents a mixture item response tree (IRTree) model for extreme response style. Unlike traditional applications of single IRTree models, a mixture approach provides a way of representing the mixture of respondents following different underlying response processes (between individuals), as well as the uncertainty present at the individual level (within an individual). Simulation analyses reveal the potential of the mixture approach in identifying subgroups of respondents exhibiting response behavior reflective of different underlying response processes. Application to real data from the Students Like Learning Mathematics (SLM) scale of Trends in International Mathematics and Science Study (TIMSS) 2015 demonstrates the superior comparative fit of the mixture representation, as well as the consequences of applying the mixture on the estimation of content and response style traits. We argue that methodology applied to investigate response styles should attend to the inherent uncertainty of response style influence due to the likely influence of both response styles and the content trait on the selection of extreme response categories.
Article
Full-text available
Many approaches in the Item Response Theory (IRT) literature have incorporated response styles to control for potential biases. However, the specific assumptions about response styles are often not made explicit. Having integrated different IRT modeling variants into a superordinate framework, we highlighted assumptions and restrictions of the models (Henninger & Meiser, 2020). In this article, we show that based on the superordinate framework, we can estimate the different models as multidimensional extensions of the Nominal Response Models in standard software environments. Furthermore, we illustrate the differences in estimated parameters, restrictions, and model fit of the IRT variants in a German Big Five standardization sample and show that psychometric models can be used to debias trait estimates. Based on this analysis, we suggest two novel modeling extensions that combine fixed and estimated scoring weights for response style dimensions, or explain discrimination parameters through item attributes. In summary, we highlight possibilities to estimate, apply, and extend psychometric modeling approaches for response styles in order to test hypotheses on response styles through model comparisons.
Article
Full-text available
Response styles are one of the major sources of common method bias. They refer to the tendency to select specifc categories of a rating scale. The most common types of response styles include acquiescence response style, disacquiescence response style, extreme response style and midpoint response style. They can cause subjects’ response to rating scales biased, and thus, they may lead to systematic measurement errors in test scores and influence the results of the analyses of test reliability and validity, and the analysis of the covariation between constructs. It is hard to measure response styles directly with a psychological scale. Thus, researchers usually use the counting procedure or the modeling procedure to measure it. The counting procedure mainly includes the traditional counting procedure, the method of representative indicators for response styles, and the method of counting double agreements on reversed items. The modeling procedure mainly includes the method of specifying acquiescence response style in confrmatory factor analysis, confrmatory latent class analysis, the mixed partial credit model, the item response tree model and the multidimensional nominal response model. When selecting appropriate methods to measure response styles, researchers need to consider the following questions. Firstly, what types of response styles do they want to measure? Secondly, are response styles regarded as continuous or categorical variables? Thirdly, is the content-related trait regarded as a continuous or categorical variable? Fourthly, can researchers obtain a group of heterogeneous items? Fifthly, are there some pairs of items that examine similar content but are worded in opposite directions? Lastly, is the number of positively worded items equal to the number of negatively worded items? As for controlling the effects of response styles on data analysis, different procedures are appropriate in controlling different effects of response styles. In some cases, it requires researchers to combine the measuring procedures and the regression residual method or the partial correlation method to control the effects of response styles. Firstly, to control the bias that response styles cause in test scores, researchers can apply the modeling procedure or the combination of the counting procedure and the regression residual method. Secondly, to control the effects of response styles on the analyses of test reliability and validity, researchers can apply the combination of the modeling procedure and the regression residual method or the combination of the counting procedure and the regression residual method. With respect to the control of the effects of response styles on the measurement invariance test, researchers can apply the procedure of specifying acquiescence response style in confrmatory factor analysis, confrmatory latent class analysis or the multidimensional nominal response model. Thirdly, to control the effect of response styles on the analysis of the covariation between constructs, researchers can apply the modeling procedure, the combination of the counting procedure and the regression residual method or the combination of the counting procedure and the partial correlation method. Future research needs to extend the existing methods for measuring and controlling response styles, examine their validity, and systematically investigate the effects of response styles on data analysis.
Article
In recent years, item response tree (IRTree) approaches have received increasing attention in the response style literature for their ability to partial out response style latent variables as well as associated item parameters. When an IRTree approach is adopted to measure extreme response styles, directional and content invariance could be assumed at the latent variable and item parameter levels. In this study, we propose to evaluate the empirical validity of these invariance assumptions by employing a general IRTree model with relaxed invariance assumptions. This would allow us to examine extreme response biases, beyond extreme response styles. With three empirical applications of the proposed evaluation, we find that relaxing some of the invariance assumptions improves the model fit, which suggests that not all assumed invariances are empirically supported. Specifically, at the latent variable level, we find reasonable evidence for directional invariance but mixed evidence for content invariance, although we also find that estimated correlations between content‐specific extreme response latent variables are high, hinting at the potential presence of a general extreme response tendency. At the item parameter level, we find no directional or content invariance for thresholds and no content invariance for slopes. We discuss how the variant item parameter estimates obtained from a general IRTree model can offer useful insight to help us understand response bias related to extreme responding measured within the IRTree framework.
Article
Multidimensional item response theory (MIRT) models for response style (e.g., Bolt, Lu, & Kim, 2014, Psychological Methods, 19, 528; Falk & Cai, 2016, Psychological Methods, 21, 328) provide flexibility in accommodating various response styles, but often present difficulty in isolating the effects of response style(s) from the intended substantive trait(s). In the presence of such measurement limitations, we consider several ways in which MIRT models are nevertheless useful in lending insight into how response styles may interfere with measurement for a given test instrument. Such a study can also inform whether alternative design considerations (e.g., anchoring vignettes, self‐report items of heterogeneous content) that seek to control for response style effects may be helpful. We illustrate several aspects of an MIRT approach using real and simulated analyses.
Article
Item response tree (IRTree) models are recently introduced as an approach to modeling response data from Likert-type rating scales. IRTree models are particularly useful to capture a variety of individuals’ behaviors involving in item responding. This study employed IRTree models to investigate response styles, which are individuals’ tendencies to prefer or avoid certain response categories in a rating scale. Specifically, we introduced two types of IRTree models, descriptive and explanatory models, perceived under a larger modeling framework, called explanatory item response models, proposed by De Boeck and Wilson. This extends the typical application of IRTree models for studying response styles. As a demonstration, we applied the descriptive and explanatory IRTree models to examine acquiescence and extreme response styles in Rosenberg’s Self-Esteem Scale. Our findings suggested the presence of two distinct extreme response styles and acquiescence response style in the scale.
Article
Full-text available
Response styles are a source of contamination in questionnaire ratings, and therefore they threaten the validity of conclusions drawn from marketing research data. In this article, the authors examine five forms of stylistic responding (acquiescence and disacquiescence response styles, extreme response style/response range, midpoint responding, and noncontingent responding) and discuss their biasing effects on scale scores and correlations between scales. Using data from large, representative samples of consumers from 11 countries of the European Union, the authors find systematic effects of response styles on scale scores as a function of two scale characteristics (the proportion of reverse-scored items and the extent of deviation of the scale mean from the midpoint of the response scale) and show that correlations between scales can be biased upward or downward depending on the correlation between the response style components. In combination with the apparent lack of concern with response styles evidenced in a secondary analysis of commonly used marketing scales, these findings suggest that marketing researchers should pay greater attention to the phenomenon of stylistic responding when constructing and using measurement instruments.
Article
Full-text available
Prior research has shown that extreme response style can seriously bias responses to survey questions and that this response style may differ across culturally diverse groups. Consequently, cross-cultural differences in extreme responding may yield incomparable responses when not controlled for. To examine how extreme responding affects the cross-cultural comparability of survey responses, we propose and apply a multiple-group latent class approach where groups are compared on basis of the factor loadings, intercepts, and factor means in a Latent Class Factor Model. In this approach a latent factor measuring the response style is explicitly included as an explanation for group differences found in the data. Findings from two empirical applications that examine the cross-cultural comparability of measurements show that group differences in responding import inequivalence in measurements among groups. Controlling for the response style yields more equivalent measurements. This finding emphasizes the importance of correcting for response style in cross-cultural research. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Though ubiquitous, Likert scaling’s traditional mode of analysis is often unable to uncover all of the valid information in a data set. Here, the authors discuss a solution to this problem based on methodology developed by quantum physicists: the state multipole method. The authors demonstrate the relative ease and value of this method by examining college students’ endorsement of one possible cause of prejudice: segregation. Though the mean level of students’ endorsement did not differ among ethnic groups, an examination of state multipoles showed that African Americans had a level of polarization in their endorsement that was not reflected by Hispanics or European Americans. This result could not have been obtained with the traditional approach and demonstrates the new method’s utility for social science research.
Article
Full-text available
For structural equation models, a huge variety of fit indices has been developed. These indices, however, can point to conflicting conclusions about the extent to which a model actually matches the observed data. The present article provides some guide-lines that should help applied researchers to evaluate the adequacy of a given struc-tural equation model. First, as goodness-of-fit measures depend on the method used for parameter estimation, maximum likelihood (ML) and weighted least squares (WLS) methods are introduced in the context of structural equation modeling. Then, the most common goodness-of-fit indices are discussed and some recommendations for practitioners given. Finally, we generated an artificial data set according to a "true" model and analyzed two misspecified and two correctly specified models as examples of poor model fit, adequate fit, and good fit. In structural equation modeling (SEM), a model is said to fit the observed data to the extent that the model-implied covariance matrix is equivalent to the empirical co-variance matrix. Once a model has been specified and the empirical covariance matrix is given, a method has to be selected for parameter estimation. Different estimation meth-ods have different distributional assumptions and have different discrepancy functions to be minimized. When the estimation procedure has converged to a reasonable
Article
Full-text available
The technical complexities and sheer size of international large-scale assessment (LSA) databases often cause hesitation on the part of the applied researcher interested in analyzing them. Further, inappropriate choice or application of statistical methods is a common problem in applied research using these databases. This article serves as a primer for researchers on the issues and methods necessary for obtaining unbiased results from LSA data. The authors outline the issues surrounding the analysis and reporting of LSA data, with a particular focus on three prominent international surveys. In addition, they make recommendations targeted at applied researchers regarding best analysis and reporting practices when using these databases.
Article
Full-text available
Extreme response style (ERS) is an important threat to the validity of survey-based marketing research. In this article, the authors present a new item response theory–based model for measuring ERS. This model contributes to the ERS literature in two ways. First, the method improves on existing procedures by allowing different items to be differentially useful for measuring ERS and by accommodating the possibility that an item’s usefulness differs across groups (e.g., countries). Second, the model integrates an advanced item response theory measurement model with a structural hierarchical model for studying antecedents of ERS. The authors simultaneously estimate a person’s ERS score and individual- and group-level (country) drivers of ERS. Through simulations, they show that the new method improves on traditional procedures. They further apply the model to a large data set consisting of 12,506 consumers from 26 countries on four continents. The findings show that the model extensions are necessary to model the data adequately. Finally, they report substantive results about the effects of sociodemographic and national-cultural variables on ERS.
Article
Full-text available
The severity of bias in respondents’ self-reports due to acquiescence response style (ARS) and extreme response style (ERS) depends strongly on how consistent these response styles are over the course of a questionnaire. In the literature, different alternative hypotheses on response style (in)consistency circulate. Therefore, nine alternative models are derived and fitted to secondary and primary data. It is found that response styles are best modeled as a tau-equivalent factor complemented with a time-invariant autoregressive effect. This means that ARS and ERS are largely but not completely consistent over the course of a questionnaire, a finding that has important implications for response style measurement and correction.
Article
Full-text available
A critique of Cheung and Rensvold’s article describing the use of mean and covariance structures (MACS) techniques for assessing bias in cross-cultural research is offered. In particular, the author highlights the critical distinction between item bias and construct bias. Although the author’s view is in fundamental agreement with Cheung and Rensvold’s primary argument that MACS analyses are a very useful tool to determine comparability of constructs across cultural settings, The author disagrees with their conclusion that measurement invariance indicates a lack of construct bias. The author argues that cross-group equivalence of all reliable measurement parameters only indicates a lack of differential item bias and does not indicate lack of construct bias. Finally, two related and important issues in MACS analyses are discussed, namely, (a) the choice of rationale for significance testing and (b) the effects of different identification methods on estimating latent mean levels.
Article
Full-text available
Questions that use a discrete ratings scale are commonplac e in survey research. Examples in marketing include customer satisfaction measuremen t and purchase intention. Survey research practitioners have long commented that respondent s vary in their usage of the scale: Common patterns include using only the middle of the scale or using the upper or lower end. These differences in scale usage can impart biases to correlation and regression analyses. To capture scale usage differences, we developed a new model with individual scale and location effects and a discrete outcome variable. We model the joint distribution of all ratings scale responses rather than speci c univariate conditional distributions as in the ordinal probit model. We apply our model to a customer satisfaction survey and show that the correlation inferences are much different once proper adjustments are made for the discreteness of the data and scale usage. We also show that our adjusted or latent ratings scale is more closely related to actual purchase behavior.
Article
Full-text available
Both structural equation modeling (SEM) and item response theory (IRT) can be used for factor analysis of dichotomous item responses. In this case, the measurement models of both approaches are formally equivalent. They were refined within and across different disciplines, and make complementary contributions to central measurement problems encountered in almost all empirical social science research fields. In this article (a) fundamental formal similiarities between IRT and SEM models are pointed out. It will be demonstrated how both types of models can be used in combination to analyze (b) the dimensional structure and (c) the measurement invariance of survey item responses. All analyses are conducted with Mplus, which allows an integrated application of both approaches in a unified, general latent variable modeling framework. The aim is to promote a diffusion of useful measurement techniques and skills from different disciplines into empirical social research.
Article
Full-text available
This article addresses the question of to what extent one type of response style, called acquiescence (or agreeing response bias), is stable over time. A structural equation modeling approach is applied to measure the stability of one acquiescence factor behind two concepts among the same respondents for a 4-year period. The data used are representative population surveys in 1995 and 1999 from the Belgian Election Study in which balanced sets of items are used for measuring two interrelated constructs: perceived ethnic threat and distrust in politics. This study provides empirical support that acquiescence is stable and consistent for a 4-year period.
Article
Full-text available
The article reviews standardization methods commonly employed to adjust for response bias in cross-cultural research. First, different standardization procedures are reviewed and a classification scheme is provided. Standardization procedures are classified according to the statistical information used (means, standard deviation) and the source of this information (individual, group, or culture). Second, empirical research in JCCP between 1970 and 2002 is reviewed. Standardization has become more common in the 1990s, and there is a trend to rely more on standardized data. Most studies used standardization prior to analysis of variance and factor analytical techniques. However, an analysis of statistical properties of standardized measures indicates that results based on standardization are ambiguous. The use of statistical techniques and the interpretation of results based on standardized data are discussed.
Article
Full-text available
It is well known that the self-report survey method suffers from many idiosyncratic biases, such as varying response styles due to different survey modes used. Using latent state-trait theory it is argued that response styles will also vary intra-individually, depending on the particular survey situation. In this study we examine intra-individual variation in extreme response style behavior (ERS) using mixed-mode survey panel data as a quasi-experimental setting. Data from the Irish National Election Study panel are used, which consists of repeated face-to-face and mail-back surveys. Latent transition analysis is used to detect switches in ERS, distinguishing 'stable' and 'volatile' respondents in terms of their response style. Overall, ERS is inflated in the intermediate mail component of the panel, whereas preliminary analyses suggest that low education and ideological extremity are drivers of that change. Results are discussed with regards to measurement errors in mixed-mode and longitudinal surveys.
Article
Full-text available
This article concerns a number of conditions that must be met before reaching a conclusion about the existence of the agreement component of a response style called acquiescence. The focus is on the specification of a common style factor behind at least 2 independent theoretical concepts, each measured by a balanced set of items. The proposed approach is explored with a random subsample (N = 986) from a large-scale survey, and subsequently confirmed in another subsample (N = 992) from the same population (Flanders) and in a random sample (N = 1,055) from a different population (Wallonia). The strong relation in both populations of the latent style factor with a variable "sum of agreements" supports the idea that it is possible to control for acquiescence in structural equations.
Article
Full-text available
Method effects often occur when constructs are measured by different methods. In traditional multitrait-multimethod (MTMM) models method effects are regarded as residuals, which implies a mean method effect of zero and no correlation between trait and method effects. Furthermore, in some recent MTMM models, traits are modeled to be specific to a certain method. However, often we are not interested in a method-specific trait but in a trait that is common to all methods. Here we present the Method Effect model with common trait factors, which allows modeling “common” trait factors and method factors that represent method “effects” rather than residuals. The common trait factors are defined as the mean of the true-score variables of all variables measuring the same trait and the method factors are defined as differences between true-score variables and means of true-score variables. Because the model allows estimating mean method effects, correlations between method factors, and correlations between trait and method factors, new research questions may be investigated. The application of the model is demonstrated by 2 examples studying the effect of negative, as compared with positive, item wording for the measurement of mood states.
Article
Full-text available
Response bias, defined by Paulhus (1991) as "a systematic tendency to respond to a range of questionnaire items on some basis other than the specific item content," has been observed in various disciplines, especially in cross-cultural research. In this study, a mathematical model of uniform response bias (URB) is defined. Ipsative measures (i.e., individual scores subject to a constant sum constraint) are proposed to minimize the effect of URB in multigroup confirmatory factor analysis (CFA) to study the measurement invariance properties across different cultural groups. The method based on Chan and Bentler (1993, 1996) for analyzing ipsative data is extended here for analyzing multigroup data potentially contaminated by URB. A real data set based on the Chinese Personality Assessment Inventory (CPAI) is used to demonstrate how the proposed procedure can be applied in real-life situations.
Article
Full-text available
Acquiescence response set (ARS), the tendency to agree with questionnaire statements regardless of content, is especially problematic in scale development when attitude structure is not well known, because it heightens the correlations among items that are worded similarly, even when they are not conceptually related. A partial correlation technique is described for measuring and controlling for ARS using the method of matched pairs. 1,351 persons earned an ARS score from the frequency with which they agreed with pairs of items logically opposite. Principal-components analysis was then performed on the 1st-order interitem partial correlation matrix, controlling for ARS score. Evidence is presented that this procedure reduces the average interitem correlation among like-worded items, increases the average interitem correlation among differently worded items measuring the same concept, and produces a principal components solution that is more interpretable. These conclusions emerge from comparisons with analyses of untransformed attitude scores and attitude scores excluding Ss who demonstrated the greatest acquiescence. (12 ref) (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
This study investigated the responses of N = 1,789 participants to a set of 12 Likert-type items for the assessment of personal need for structure (PNS). Mixture-distribution Rasch models were used to analyze the homogeneity of the response format across items and the homogeneity of the item parameters and category parameters across persons. Model selection yielded a two-class rating scale model as the favorite model. This model contains the assumptions that the Likert response scale is used in a constant way for all items but that the item or category parameters differ between two latent subpopulations. The parameter estimates revealed large differences in the threshold parameters for the response categories between the two subpopulations. While the larger subpopulation showed a tendency to avoid extreme response categories, the smaller subpopulation used the whole range of the response scale. The different response styles identified by the mixture-distribution Rasch analysis were validated by significantly higher Extraversion scores for participants in the smaller subpopulation that showed more extreme and impulsive rating behavior. The results confirmed that PNS reflects quantitative interindividual differences, and they also showed that the total score of the 12 PNS items forms a combination of the latent PNS trait and response style. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Full-text available
Here is the reference for this chapter. MacKinnon, D. P., Cheong, J., Pirlott, A. G. (2012) In Cooper, H., Camic, P. M., Long, D. L., Panter, A. T., Rindskopf, D., Sher, K. J. (Eds.) (2012). APA handbook of research methods in psychology, Vol 2: Research designs: Quantitative, qualitative, neuropsychological, and biological., (pp. 313-331). Washington, DC, US: American Psychological Association.
Article
Full-text available
Cross-mode surveys are on the rise. The current study compares levels of response styles across three modes of data collection: paper-and-pencil questionnaires, telephone interviews, and online questionnaires. The authors make the comparison in terms of acquiescence, disacquiescence, and extreme and midpoint response styles. To do this, they propose a new method, namely, the representative indicators response style means and covariance structure (RIRSMACS) method. This method contributes to the literature in important ways. First, it offers a simultaneous operationalization of multiple response styles. The model accounts for dependencies among response style indicators due to their reliance on common item sets. Second, it accounts for random error in the response style measures. As a consequence, random error in response style measures is not passed on to corrected measures. The method can detect and correct cross-mode response style differences in cases where measurement invariance testing and multitrait multimethod designs are inadequate. The authors demonstrate and discuss the practical and theoretical advantages of the RIRSMACS approach over traditional methods.
Article
Full-text available
In this article, I show how item response models can be used to capture multiple response processes in psychological applications. Intuitive and analytical responses, agree-disagree answers, response refusals, socially desirable responding, differential item functioning, and choices among multiple options are considered. In each of these cases, I show that the response processes can be measured via pseudoitems derived from the observed responses. The estimation of these models via standard software programs that allow for missing data is also discussed. The article concludes with two detailed applications that illustrate the prevalence of multiple response processes. (PsycINFO Database Record (c) 2012 APA, all rights reserved).
Article
Full-text available
A new measure that focused explicitly on the cognitive dimension of test anxiety was introduced and examined for psychometric quality as compared to existing measures of test anxiety. The new scale was found to be a reliable and valid measure of cognitive test anxiety. The impact of cognitive test anxiety as well as emotionality and test procrastination were subsequently evaluated on three course exams and students' self-reported performance on the Scholastic Aptitude Test for 168 undergraduate students. Higher levels of cognitive test anxiety were associated with significantly lower test scores on each of the three course examinations. High levels of cognitive test anxiety also were associated with significantly lower Scholastic Aptitude Test scores. Procrastination, in contrast, was related to performance only on the course final examination. Gender differences in cognitive test anxiety were documented, but those differences were not related to performance on the course exams. Examination of the relation between the emotionality component of test anxiety and performance revealed that moderate levels of physiological arousal generally were associated with higher exam performance. The results were consistent with cognitive appraisal and information processing models of test anxiety and support the conclusion that cognitive test anxiety exerts a significant stable and negative impact on academic performance measures.
Article
Full-text available
Several lines of evidence suggest that acquiescent and extreme response styles are related to low education or low cognitive ability. Using measures constructed from the World Values Survey, this hypothesis is examined both in comparisons among individuals and comparisons among countries. At the individual level, both acquiescent and extreme responding are positively related to age and negatively to education and income in most world regions. Both response styles are most prevalent in the less developed countries. At the country-level, extremity is best predicted by a low average IQ in the country, and acquiescence by a high level of corruption.
Article
Full-text available
Self-concept is linked to student achievement in many domains. In this study, we examined reading self-concept's (RSC) and RSC calibration accuracy's links to reading achievement across different contexts via multi-level analyses of 34 countries' 158,848 fifteen-year-olds' reading tests and questionnaire responses. Students with higher RSC, higher calibration accuracy (of RSC to their reading scores) or underconfidence (relative to their reading scores) had higher reading scores. RSC was more strongly linked to reading scores in countries that were richer, less equal, more collective, less uncertainty averse, less hierarchical, or less rigid regarding gender roles. Calibration accuracy was also more strongly linked to reading achievement in more hierarchical, individualistic, or uncertainty-tolerant countries. In more individualistic countries, underconfident students were more likely to have above average reading achievement. Hence, excessive confidence does not necessarily benefit students, especially in more individualistic countries.
Article
The author examines whether the response styles of yeasaying and standard deviation in rating scale responses convey information on respondents’ attitudes or create bias that distorts attitude information and marketing research. A method is proposed to identify attitude information components and bias components in response styles, using prediction errors in attitude-behavior models. Analysis of data from a large-scale consumer survey supports the presence of both attitude information and bias components in standard deviation, and an attitude information but not a bias component in yeasaying. This finding suggests that correcting rating scale data by removing the bias but not the attitude information in standard deviation can increase the accuracy of survey research. Examples are given of how bias in standard deviation, and the scoring correction, affect segmentation research.
Article
This monograph is a part of a more comprehensive treatment of estimation of latent traits, when the entire response pattern is used. The fundamental structure of the whole theory comes from the latent trait model, which was initiated by Lazarsfeld as the latent structure analysis [Lazarsfeld, 1959], and also by Lord and others as a theory of mental test scores [Lord, 1952]. Similarities and differences in their mathematical structures and tendencies were discussed by Lazarsfeld [Lazarsfeld, 1960] and the recent book by Lord and Novick with contributions by Birnbaum [Lord & Novick, 1968] provides the dichotomous case of the latent trait model in the context of mental measurement.
Chapter
The partial credit model (PCM) by Masters (1982, this volume) is a unidimensional item response model for analyzing responses scored in two or more ordered categories. The model has some very desirable properties: it is an exponential family, so minimal sufficient statistics for both the item and person parameters exist, and it allows conditional-maximum likelihood (CML) estimation. However, it will be shown that the relation between the response categories and the item parameters is rather complicated. As a consequence, the PCM may not always be the most appropriate model for analyzing data.
Article
A category of item response models is presented with two defining features: they all (i) have a tree representation, and (ii) are members of the family of generalized linear mixed models (GLMM). Because the models are based on trees, they are denoted as IRTree models. The GLMM nature of the models implies that they can all be estimated with the glmer function of the lme4 package in R. The aim of the article is to present four subcategories of models, the first two of which are based on a tree representation for response categories: 1. linear response tree models (e.g., missing response models), 2. nested response tree models (e.g., models for parallel observations regarding item responses such as agreement and certainty), while the last two are based on a tree representation for latent variables: 3. linear latent-variable tree models (e.g., models for change processes), and 4. nested latent-variable tree models (e.g., bi-factor models). The use of the glmer function is illustrated for all four subcategories. Simulated example data sets and two service functions useful in preparing the data for IRTree modeling with glmer are provided in the form of an R package, irtrees. For all four subcategories also a real data application is discussed.
Article
Cross-cultural research is now an undeniable part of mainstream psychology and has had a major impact on conceptual models of human behavior. Although it is true that the basic principles of social psychological methodology and data analysis are applicable to cross-cultural research, there are a number of issues that are distinct to it, including managing incongruities of language and quantifying cultural response sets in the use of scales. Cross-Cultural Research Methods in Psychology provides state-of-the-art knowledge about the methodological problems that need to be addressed if a researcher is to conduct valid and reliable cross-cultural research. It also offers practical advice and examples of solutions to those problems and is a must-read for any student of culture.
Article
Response styles can influence item responses in addition to a respondent’s latent trait level. A common concern is that comparisons between individuals based on sum scores may be rendered invalid by response style effects. This paper investigates a multidimensional approach to modeling traits and response styles simultaneously. Models incorporating different response styles as well as personality traits (Big Five facets) were compared regarding model fit. Relationships between traits and response styles were investigated and different approaches to modeling extreme response style (ERS) were compared regarding their effects on trait estimates. All multidimensional models showed a better fit than the unidimensional models, indicating that response styles influenced item responses with ERS showing the largest incremental variance explanation. ERS and midpoint response style were mainly trait-independent whereas acquiescence and disacquiescence were strongly related to several personality traits. Expected a posteriori estimates of participants’ trait levels did not differ substantially between two-dimensional and unidimensional models when a set of heterogeneous items was used to model ERS. A minor adjustment of trait estimates occurred when the same items were used to model ERS and the trait, though the ERS dimension in this approach only reflected scale-specific ERS, rather than a general ERS tendency.
Article
This study shows how to address the problem of trait-unrelated response styles (RS) in rating scales using multidimensional item response theory. The aim is to test and correct data for RS in order to provide fair assessments of personality. Expanding on an approach presented by Böckenholt (2012), observed rating data are decomposed into multiple response processes based on a multinomial processing tree. The data come from a questionnaire consisting of 50 items of the International Personality Item Pool measuring the Big Five dimensions administered to 2,026 U.S. students with a 5-point rating scale. It is shown that this approach can be used to test if RS exist in the data and that RS can be differentiated from trait-related responses. Although the extreme RS appear to be unidimensional after exclusion of only 1 item, a unidimensional measure for the midpoint RS is obtained only after exclusion of 10 items. Both RS measurements show high cross-scale correlations and item response theory-based (marginal) reliabilities. Cultural differences could be found in giving extreme responses. Moreover, it is shown how to score rating data to correct for RS after being proved to exist in the data.
Article
Extreme response style (ERS) is a systematic tendency for a person to endorse extreme options (e.g., strongly disagree, strongly agree) on Likert-type or rating-scale items. In this study, we develop a new class of item response theory (IRT) models to account for ERS so that the target latent trait is free from the response style and the tendency of ERS is quantified. Parameters of these new models can be estimated with marginal maximum likelihood estimation methods or Bayesian methods. In this study, we use the freeware program WinBUGS, which implements Bayesian methods. In a series of simulations, we find that the parameters are recovered fairly well; ignoring ERS by fitting standard IRT models resulted in biased estimates, and fitting the new models to data without ERS did little harm. Two empirical examples are provided to illustrate the implications and applications of the new models.
Article
The consistency of extreme response style (ERS) and non-extreme response style (NERS) across the latent variables assessed in an instrument is investigated. Analyses were conducted on several PISA 2006 attitude scales and the German NEO-PI-R. First, a mixed partial credit model (PCM) and a constrained mixed PCM were compared regarding model fit. If the constrained mixed PCM fit better, latent classes differed only in their response styles but not in the latent variable. For scales where this was the case, participants’ membership to NERS or ERS on each scale was entered into a latent class analysis (LCA). For both instruments, this second order LCA revealed that the response style was consistent for the majority of the participants across latent variables.
Article
This article extends a methodological approach considered by Bolt and Johnson for the measurement and control of extreme response style (ERS) to the analysis of rating data from multiple scales. Specifically, it is shown how the simultaneous analysis of item responses across scales allows for more accurate identification of ERS, and more effective control of ERS effects on the substantive trait estimates, than when analyzing just one scale. Moreover, unlike a competing approach presented by Greenleaf, the current strategy can accommodate conditions in which the substantive traits across scales correlate, as is almost always the case in social sciences research. Simulation and real data analyses are used for illustration.
Article
Multidimensional item response models are usually implemented to model the relationship between item responses and two or more traits of interest. We show how multidimensional multinomial logit item response models can also be used to account for individual differences in response style. This is done by specifying a factor-analytic model for latent responses at the category level. This permits traits and response style to be separated into separate but possibly correlated factors when properly identified by the factor structure. Special cases of this model can be viewed as generalizations of some unidimensional multinomial logit item response models. In this article, we describe and demonstrate the specification and implementation of these models to account for individual differences in response style that would otherwise compromise the validity of the measurement model.
Article
Research on extreme response style (ERS) in rating scale responses has been characterized by conflicting findings and little agreement over how to assemble and validate ERS measures. This article proposes that, when ERS is defined as a proportion of extreme responses, an ERS measure will be more accurate if the items are uncorrelated and have equal extreme response proportions. Furthermore, appropriate stochastic models should be used to assess the internal reliability and convergent validity of these measures. An ERS measure is created and validated with this method, using items from a survey administered in 1975 and 1987 to large samples of U.S. adults serving on a consumer panel. We find that ERS is stable over a lengthy survey compared to a benchmark stability for a “perfect” measure. Furthermore, the distribution of ERS over this population is stable over time. Respondents' ERS is related to their age, education level, and household income but not to their gender.
Article
In the two-predictor situation it is shown that traditional and negative suppressors increase the predictive value of a standard predictor beyond that suggested by the predictor's zero order validity. This effect of suppression is used to provide a revised definition of suppression and completely accounts for traditional and negative suppression. The revised definition, in conjunction with a two factor model, is shown to lead to a previously undetected type of suppression (reciprocal suppression) which occurs when predictors with positive zero order validities are negatively correlated with one another. In terms of the definition and parameters of the model, limits are determined in which the types of suppression can occur. Furthermore, it is shown how suppressors can be identified in multiple regression equations and a procedure is given for interpreting whether the variables are contributing directly (by predicting relevant variance in the criterion) or indirectly (by removing irrelevant variance in another predictor) or both.
Article
Ipsatizing variable prior to component analysis, by subtracting the man score of each individual from all scores of that individual, has met with serious criticisms, both strategic and technical. On the other hand, the procedure is still popular in the study of personality, as a means of removing acquiescence variance. An attempt is made at reconciling these facts. Technical objections to component analysis of ipsatized variables, recently restated by Dunlap and Cornwall, are shown to be based on erroneous premises. A strategic objection leveled by Clemans does apply in a majority of cases, but has no bearing on the specific case of ipsatizing responses to personality questionnaires made up of opposite item pears. As an alternative to ipsatizing by subtraction of the mean, partialling the mean component is suggested as a more elegant procedure, allowing a breakdown of explained variance into variance due to the mean and to subequent components uncorrelated with the mean.
Article
Outlines a stepwise approach to the construction of latent trait models. As a special case of the derived general sequential model, a sequential Rasch model is considered after discussing the family of latent trait models of G. Rasch (1960, 1961, 1977). The derivation of the models is based on mechanisms of latent random variables. The unconditional maximum likelihood estimation is embedded in the framework of generalized linear models. An alternative estimation procedure that allows for parameter separability is considered, and its applicability is shown. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Examined whether the response styles of yeasaying and standard deviation in rating scale responses convey information on respondents' attitudes or create bias that distorts attitude information and marketing research. A method is proposed to identify attitude information components and bias components in response styles, using prediction errors in attitude-behavior models. Data from a 1987 consumer survey completed by 4,061 Ss support the presence of both attitude information and bias components in standard deviation, and an attitude information but not a bias component in yeasaying. This finding suggests that correcting rating scale data by removing the bias but not the attitude information in standard deviation can increase the accuracy of survey research. Examples are given of how bias in standard deviation, and the scoring correction, affect segmentation research. (PsycINFO Database Record (c) 2012 APA, all rights reserved)
Article
Mixed Rasch modelling allows the existence of sub-groups who differ in their use of questionnaire response scales to be investigated. This is relevant to personality measurement, as individual differences in scale use are a source of error in calculating trait scores. The objectives of the analysis reported here were to investigate the use of the mixed Rasch model on personality (NEO-FFI) data and to examine the issue of response set contributions to scale-level NEO correlations. Modelling led to two-class solutions for Neurot-icism, Agreeableness and Conscientiousness, which could be interpreted in terms of respondents showing a preference for the extremes or the middle of the scale. Correlations between all pairs of class assignments were positive and highly significant, demonstrating consistency in individual response preference across traits. Plots of item difficulty parameters suggested that both groups interpreted the items similarly. 'Extreme responders' had significantly higher scores on Extraversion and Conscientiousness, and females and younger people were more likely than males and older people to show a preference for extreme responding. For Extraversion and Openness, model fitting gave more complex results, but a two-class solu-tion was considered to also be the most appropriate for Extraversion. The results of this study confirm pre-vious findings on the existence of distinct response classes in self-report data and on personality correlates of individual differences in response scale use and also allow an estimate of response set effects on scale-level correlations.
Article
Because of the importance of mediation studies, researchers have been continuously searching for the best statistical test for mediation effect. The approaches that have been most commonly employed include those that use zero-order and partial correlation, hierarchical regression models, and structural equation modeling (SEM). This study extends MacKinnon and colleagues (MacKinnon, Lockwood, Hoffmann, West, & Sheets, 2002; MacKinnon, Lockwood, & Williams, 2004, MacKinnon, Warsi, & Dwyer, 1995) works by conducting a simulation that examines the distribution of mediation and suppression effects of latent variables with SEM, and the properties of confidence intervals developed from eight different methods. Results show that SEM provides unbiased estimates of mediation and suppression effects, and that the bias-corrected bootstrap confidence intervals perform best in testing for mediation and suppression effects. Steps to implement the recommended procedures with Amos are presented.
Article
This research note addresses the challenge of how to optimally measure acquiescence response style (ARS) and extreme response style (ERS). This is of crucial importance in assessing results from studies that have tried to identify antecedents of response styles (such as age, education level, national culture). Using survey data from the Netherlands, a comparison is made between the traditional method and a more recently proposed method of measuring ARS and ERS (i.e., the convergent validity across both methods is assessed). The traditional method is based on an ad hoc set of related items. The alternative method uses a set of randomly sampled items to optimize heterogeneity and representativeness of the items. It is found that the traditional method may lead to response style measures that are suboptimal for estimating levels of ARS and ERS as well as relations of ARS and ERS with other variables (like hypothesized antecedents). Recommendations on how to measure response styles are provided. KeywordsBetween-method convergent validity-Response styles-Acquiescence/extreme response style-Representative indicators for response styles (RIRS)