Table 1 - uploaded by Anna Brown
Content may be subject to copyright.
Source publication
Multidimensional forced-choice (MFC) questionnaires typically show good validities and are resistant to impression management effects. However, they yield ipsative data, which distorts scale relationships and makes comparisons between people problematic. Depressed reliability estimates also led developers to create tests of potentially excessive le...
Context in source publication
Citations
... Similarly, too many similar trials can result in boredom, which may cause participants to rush or disengage. A reduced number of triplets ensures that a task remains mentally stimulating while maintaining attention levels [37]. Finally, it has been argued that comparing items directly within a block may be cognitively simpler than rating them one by one, particularly when there are many rating categories with few or poor verbal anchors [20]. ...
... Due to the binary nature of the data, the DWLS algorithm was used, and the model was fitted to tetrachoric correlations. The hypothesized structural model was that of independent clusters [37] with no cross-loadings. The fit for the hypothesized five-factor model was as follows: χ 2 = 7022.76, ...
This study had two purposes: (1) to develop a forced-choice personality inventory to assess student personality characteristics based on the five-factor (FFM) personality model and (2) to examine its factor structure via the Thurstonian Item Response Theory (TIRT) approach based on Thurstone’s law of comparative judgment. A total of 200 items were generated to represent the five dimensions, and through Principal Axis Factoring and the composite reliability index, a final pool of 75 items was selected. These items were then organized into 25 blocks, each containing three statements (triplets) designed to balance social desirability across the blocks. The study involved two samples: the first sample of 1484 students was used to refine the item pool, and the second sample of 823 university students was used to examine the factorial structure of the forced-choice inventory. After re-coding the responses into a binary format, the data were analyzed within a standard structural equation modeling (SEM) framework. Then, the TIRT model was applied to evaluate the factorial structure of the forced-choice inventory, with the results indicating an adequate fit. Further suggestions for future research with additional studies are provided to justify the scale’s reliability (e.g., test–retest) and validity (e.g., concurrent, convergent, and divergent).
... This property is sometimes referred to as parameter stability or parameter invariance. Because of this property, estimates of respondents' ability are not affected by the specific items that they have answered (Brown, 2016;Brown & Bartram, 2009;Lang & Tay, 2021). ...
The Regulatory Focus Theory is an important theory in the fields of social and human sciences that explains how people pursue their goals. However, despite its importance, a consensus on the optimal scale for assessing individual regulatory focus remains inconclusive. The comparisons of the two primary scales used in studies, the Regulatory Focus Questionnaire (RFQ) and the General Regulatory Focus Measure (GRFM), have shown a lack of consensus. Considering the importance of the Regulatory Focus Theory, the widespread use of the GRFM scale, and the advantages of the forced-choice format over the ranking format, this article aims to propose and validate an adaptation of the GRFM scale to the forced-choice format (GRFM-FC) using Item Response Theory (IRT) analysis. Because the GRFM scale is usually included in studies along with other scales and measures, a scale with fewer items, such as the GRFM-FC, can facilitate and broaden its use by saving time and effort, increasing the response rate of surveys, and reducing respondent biases, such as leniency and severity, central tendency, and social desirability. The new GRFM-FC is more efficient than the GRFM to differentiate between the two types of regulatory focus, contributing to its predictive validity. In addition to solving the problem of ipsative data, IRT analysis provides a more precise informational capacity for each scale item than the Classical Test Theory.
... The development of FC IRT models not only made the extraction of information from comparative data more efficient (e.g., Brown & Bartram, 2009), but also opened up the possibility of computerized adaptive testing (CAT). CAT tailors an assessment to each and every individual in real time-the most informative questions for a candidate are presented, based on existing intelligence about them (e.g., their response to previous questions in the assessment, or their results from previous assessment occasions). ...
... The TIRT model is able to handle multidimensionality, is flexible when modeling FC blocks of any size, and is compatible with the most commonly used dominance items. Moreover, the TIRT model has demonstrated great usability and utility in empirical applications, such as its ability to estimate item parameters from actual FC data (e.g., Brown & Bartram, 2009, 2009. ...
Several forced-choice (FC) computerized adaptive tests (CATs) have emerged in the field of organizational psychology, all of them employing ideal-point items. However, despite most items developed historically follow dominance response models, research on FC CAT using dominance items is limited. Existing research is heavily dominated by simulations and lacking in empirical deployment. This empirical study trialed a FC CAT with dominance items described by the Thurstonian Item Response Theory model with research participants. This study investigated important practical issues such as the implications of adaptive item selection and social desirability balancing criteria on score distributions, measurement accuracy and participant perceptions. Moreover, nonadaptive but optimal tests of similar design were trialed alongside the CATs to provide a baseline for comparison, helping to quantify the return on investment when converting an otherwise-optimized static assessment into an adaptive one. Although the benefit of adaptive item selection in improving measurement precision was confirmed, results also indicated that at shorter test lengths CAT had no notable advantage compared with optimal static tests. Taking a holistic view incorporating both psychometric and operational considerations, implications for the design and deployment of FC assessments in research and practice are discussed.
... In practice, many commercially available FC personality questionnaireswidely used for assessment and selection across work contextsappear to use blocks with items matched in a fixed manner, regardless of the job being applied for (Brown & Bartram, 2009;Jackson et al., 2000). This approach is pragmatic. ...
Forced choice (FC) personality questionnaires attempt to constrain job applicants from presenting idealized responses (or “faking”). FC questionnaires are designed by identifying items equally desirable in applicants, matching these into “blocks,” and instructing respondents to rank the items “most like” themselves. Nonetheless, how closely items should be matched remains unclear, and desirability seems dependent on the job. We investigated how strongly respondents (N = 436) agreed regarding the “ideal” applicant response, while varying (a) how closely items were matched into blocks and (b) the job context. While the most closely matched blocks elicited slight agreement on an ideal response, agreement increased noticeably with poorer matching. Nonetheless, differences in item desirability between different job conditions were evident even in closely matched blocks.
... However, the OPQ32i blocks consist of four items (so-called "quads") and OPQ32r blocks consist of three items ("triplets"). The OPQ32r triplets were developed through removing one item per quad from OPQ32i (Brown & Bartram, 2009. Except wording improvements for 5 items, all other remaining items were exactly the same across versions. ...
... First, by extracting response information in a more efficient manner, assessment length can be shortened significantly. For example,Brown and Bartram (2009) refined a classically-scored FC personality assessment using IRT methodologies, successfully reducing assessment time by 40-50% while maintaining similar levels of score reliability. Second, when combined with computer-based testing technology, IRT opens up the possibility of tailoring the assessment to each and every individual. ...
A fundamental assumption in computerized adaptive testing is that item parameters are invariant with respect to context—items surrounding the administered item. This assumption, however, may not hold in forced-choice (FC) assessments, where explicit comparisons are made between items included in the same block. We empirically examined the influence of context on item parameters by comparing parameter estimates from two FC instruments. The first instrument was composed of blocks of three items, whereas in the second, the context was manipulated by adding one item to each block, resulting in blocks of four. The item parameter estimates were highly similar. However, a small number of significant deviations were observed, confirming the importance of context when designing adaptive FC assessments. Two patterns of such deviations were identified, and methods to reduce their occurrences in an FC computerized adaptive testing setting were proposed. It was shown that with a small proportion of violations of the parameter invariance assumption, score estimation remained stable.
... Re-analysis of the Customer Contact Styles Questionnaire data (CCSQ) demonstrates the advantages of IRT modeling for personality assessment; specifically, interpersonal comparability of person trait scores estimated by the IRT method as opposed to the classical method resulting in ipsative scores (Brown and Maydeu-Olivares, 2013). The development of a new IRT scored version of the Occupational Personality Questionnaire (OPQ32r) illustrates how Thurstonian IRT modeling may be applied to re-analyze and re-develop an existing assessment tool, enhancing its strong features and transforming its scoring protocol (Brown and Bartram, 2009). The Thurstonian IRT model was also used to inform the development of a new measure, the Forced-Choice Five Factor Markers (Brown and Maydeu-Olivares, 2011). ...
Instead of responding to questionnaire items one at a time, respondents may be forced to make a choice between two or more items measuring the same or different traits. The forced-choice format eliminates uniform response biases, although the research on its effectiveness in reducing the effects of impression management is inconclusive. Until recently, forced-choice questionnaires were scaled in relation to person means (ipsative data), providing information for intra-individual assessments only. Item response modeling enabled proper scaling of forced-choice data, so that inter-individual comparisons may be made. New forced-choice applications in personality assessment and directions for future research are discussed.
... The OPQ32i consists of 104 sets of these quads. As a result of the work by Brown and Bartram (2009), the Thurstonian item response model was applied to OPQ32i forced-choice data and normative scale scores were produced. From this, an IRTscored version, the OPQ32r, was developed. ...
The present research investigated if an item response theory (IRT)-scored forced-choice personality questionnaire has the same normative data structures as a similar version that uses a 5-point Likert scale instead. The study was conducted using a sample of 349 training delegates who completed both an IRT-scored forced-choice and a normative single-stimulus version of the questionnaire. Results largely supported the scaling properties, measurement precision, and equivalence of the data structures of the two scoring methods.
... correlations between the latent attributes) using Mplus (L.K. Muthén & B.O. Muthén, 1998-2012, which conveniently combines all necessary features. Applications to date demonstrate successful re-analysis of existing forcedchoice data using this approach (Brown & Maydeu-Olivares, 2013;Brown, 2009;Brown & Bartram, 2009-2011. ...
... In this article, it is shown that when the focus of measurement is the attributes underlying the items, the scale origin may be identified without any special remedies such as embedding a small number of unidimensional pairs into multidimensional forced-choice questionnaires advocated in Stark, Chernyshenko and Drasgow (2005;also Drasgow, Chernyshenko & Stark;2009). ...
In forced-choice questionnaires, respondents have to make choices between two or more items presented at the same time. Several IRT models have been developed to link respondent choices to underlying psychological attributes, including the recent MUPP (Stark, Chernyshenko & Drasgow, 2005) and Thurstonian IRT (Brown & Maydeu-Olivares, 2011) models. In the present article, a common framework is proposed that describes forced-choice models along three axes: 1) the forced-choice format used; 2) the measurement model for the relationships between items and psychological attributes they measure; and 3) the decision model for choice behavior. Using the framework, fundamental properties of forced-choice measurement of individual differences are considered. It is shown that the scale origin for the attributes is generally identified in questionnaires using either unidimensional or multidimensional comparisons. Both dominance and ideal point models can be used to provide accurate forced-choice measurement; and the rules governing accurate person score estimation with these models are remarkably similar.
... Hicks (1970, p. 167) stated that: "each score for an individual is dependent on his own scores on other variables, but is independent of, and not comparable with, the scores of other individuals." Ipsative scores differ from normative scores in that they assess relative instead of absolute values (Brown and Bartram 2009). As a consequence, only intra-individual-not interindividual-comparisons are possible (Cattell 1944;Hicks 1970;Closs 1976;Fedorak and Coles 1979;McClean and Chissom 1986;Baron 1996;Closs 1996). ...
... Because of these features, conventional correlation-based methods-such as factor analysis, regression analysis, and LISREL-are not allowed (Guilford 1952;Cleman 1966;Massy et al. 1966;Jackson and Alwin 1980;Chan and Bentler 1993;Dunlap and Cornwell 1994;Cornwell and Dunlap 1994). Also, the interpretation of results may be problematic, since the correlations between constant-sum scales and between ipsative factors often turn out to be spuriously negative (Cleman 1966;Hicks 1970;Johnson et al. 1988;Baron 1996;Brown and Bartram 2009). In general, reliability of ipsative data is lower than of normative data (Saville and Willson 1991;Bartram 1996). ...
In this paper, the analysis and test of ipsative data will be discussed, and some alternative methods will be suggested. Following a review of the literature about ipsative measurement, the Competing Values Framework will be presented as a major application in the field of organizational culture and values. An alternative approach for the intra-individual analysis and test of ipsative data will be suggested, which consists of: (i) a method that uses closed part-wise geometric means as a descriptive statistic; (ii) a nonparametric bootstrap test to create confidence intervals; and (iii) a permutation test to evaluate equivalence between ipsative scores. All suggested methods satisfy the three basic statistical requirements for the analysis of ipsative data, that is: scale invariance, permutation invariance, and subcompositional coherence. Our suggested approach can correctly compute and compare organizational culture profiles within the same organization, as will be demonstrated with an example. However, the problem of drawing inter-organizational contrasts in ipsative measurement still remains unsolved. Also, our alternative approach only allows for a relative interpretation of the results.
... More recently, developments in IRT technology have produced methods for analysing multidimensional forced-choice items (Brown & Maydeu-Olivares, 2011;Chernyshenko et al., 2007;Heggestad et al., 2006;Maydeu-Olivares & Brown, 2010;McCloy, Heggestad, & Reeve, 2005;Stark, Chernyshenko, & Drasgow, 2005;Stark, Chernyshenko, Drasgow, & Williams, 2006). Even though the number of studies is too small at present to be conclusive about the usefulness of IRT approaches in personnel and academic selection, recent findings show that the predictive validity of FC inventories is similar or larger than the predictive validity of their SS counterparts (Brown & Bartram, 2009). ...
... The first approach relies on an ideal-point response process (Chernyshenko, Stark, Drasgow, & Roberts, 2007;McCloy et al., 2005;Stark et al., 2005). The second approach deals with items of dominance (Brown & Bartram, 2009;Brown & Maydeu-Olivares, 2011;Maydeu-Olivares & Brown, 2010); Maydeu-Olivares & Bo¨ckenholt, 2005. Both approaches are based on the assumptions posited by Thurstone (1928) for the measurement of attitudes. ...
This article reports a comprehensive meta-analysis of the criterion-oriented validity of the Big Five personality dimensions assessed with forced-choice (FC) inventories. Six criteria (i.e., performance ratings, training proficiency, productivity, grade-point average, global occupational performance, and global academic performance) and three types of FC scores (i.e., normative, quasi-ipsative, and ipsative) served for grouping the validity coefficients. Globally, the results showed that the Big Five assessed with FC measures have similar or slightly higher validity than the Big Five assessed with single-stimulus (SS) personality inventories. Quasi-ipsative measures of conscientiousness (K=44, N=8794, =.40) are found to be better predictors of job performance than normative and ipsative measures. FC inventories also showed similar reliability coefficients to SS inventories. Implications of the findings for theory and practice in academic and personnel decisions are discussed, and future research is suggested.