Hill CD, Edwards MC, Thissen D, et al. Practical issues in the application of item response theory: a demonstration using items from the Pediatric Quality of Life Inventory (PedsQL) 4.0 Generic Core Scales. Med Care.45(Suppl 1):S39-S47

Department of Psychology, The Ohio State University, Columbus, Ohio, United States
Medical Care (Impact Factor: 3.23). 06/2007; 45(5 Suppl 1):S39-47. DOI: 10.1097/01.mlr.0000259879.05499.eb
Source: PubMed


Item response theory (IRT) is increasingly being applied to health-related quality of life instrument development and refinement. This article discusses results obtained using categorical confirmatory factor analysis (CCFA) to check IRT model assumptions and the application of IRT in item analysis and scale evaluation.
To demonstrate the value of CCFA and IRT in examining a health-related quality of life measure in children and adolescents.
This illustration uses data from 10,241 children and their parents on items from the 4 subscales of the PedsQL 4.0 Generic Core Scales. CCFA was applied to confirm domain dimensionality and identify possible locally dependent items. IRT was used to assess the strength of the relationship between the items and the constructs of interest and the information available across the latent construct.
CCFA showed generally strong support for 1-factor models for each domain; however, several items exhibited evidence of local dependence. IRT revealed that the items generally exhibit favorable characteristics and are related to the same construct within a given domain. We discuss the lessons that can be learned by comparing alternate forms of the same scale, and we assess the potential impact of local dependence on the item parameter estimates.
This article describes CCFA methods for checking IRT model assumptions and provides suggestions for using these methods in practice. It offers insight into ways information gained through IRT can be applied to evaluate items and aid in scale construction.

Full-text preview

Available from:
  • Source
    • "However, most of the researches use the statistical approaches where data are administered from the perspective of the patients using self-completed questionnaires. Examples of such researches can be retrieved fromVarni et al. (2007), Hill et al. (2007, and Modern Applied Science Vol. 10, No. 2; 2016 "
    [Show abstract] [Hide abstract]
    ABSTRACT: Health Related Quality of Life (HRQoL) is one of the escalating subjects used for assessing health condition among patients who suffer specific diseases or ailments. It has been known that dimensions of HRQoL are able to mirror one’s overall health condition using mainly standard statistical technique. However, devising the extent of contribution of multiple dimensions towards overall health conditions is not straight forward as the arbitrary nature of HRQoL dimensions. Therefore this paper aims to propose a model to explain the relationship between HRQoL dimensions and overall health condition using a matrix driven fuzzy linear regression. An experiment was conducted to measure the strength of the relationship among elderly people via judgment provided by ten decision makers. The health condition linguistic data and scaled data of regularity of experiencing health-related problems among elderly people were given by the decision makers. The five stepwise computations based on matrix-driven fuzzy linear regression were proposed to describe the relationship. It is found that nearly forty six percent variations in overall health condition of elder people were explained by the eights HRQoL dimensions. The employment of matrix-driven multivariate fuzzy linear regression model has successfully identified the strength of the relationship between multi dimensions of HRQoL and overall health condition in the case of elderly people.
    Preview · Article · Dec 2015 · Modern Applied Science
  • Source
    • "In addition, IRT methodology will be useful to industry in the design of psychometric tests. IRT has been used to analyse clinical measures in several different fields: schizophrenia [11], depression [12], attachment [13], social inhibition [14] and quality of life [15]. IRT has also been used to examine ADL and Instrumental Activities of Daily Living (IADL) scales [16,17]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Performance on psychometric tests is key to diagnosis and monitoring treatment of dementia. Results are often reported as a total score, but there is additional information in individual items of tests which vary in their difficulty and discriminatory value. Item difficulty refers to an ability level at which the probability of responding correctly is 50%. Discrimination is an index of how well an item can differentiate between patients of varying levels of severity. Item response theory (IRT) analysis can use this information to examine and refine measures of cognitive functioning. This systematic review aimed to identify all published literature which had applied IRT to instruments assessing global cognitive function in people with dementia. A systematic review was carried out across Medline, Embase, PsychInfo and CINHAL articles. Search terms relating to IRT and dementia were combined to find all IRT analyses of global functioning scales of dementia. Of 384 articles identified four studies met inclusion criteria including a total of 2,920 people with dementia from six centers in two countries. These studies used three cognitive tests (MMSE, ADAS-Cog, BIMCT) and three IRT methods (Item Characteristic Curve analysis, Samejima's graded response model, the 2-Parameter Model). Memory items were most difficult. Naming the date in the MMSE and memory items, specifically word recall, of the ADAS-cog were most discriminatory. Four published studies were identified which used IRT on global cognitive tests in people with dementia. This technique increased the interpretative power of the cognitive scales, and could be used to provide clinicians with key items from a larger test battery which would have high predictive value. There is need for further studies using IRT in a wider range of tests involving people with dementia of different etiology and severity.
    Full-text · Article · Feb 2014 · BMC Psychiatry
  • Source
    • "Kook and Varni [23] have provided a comprehensive review of the use of IRT and CTT in the validation of the Korean version of the PedsQLTM 4.0. Hill et al. [24] have demonstrated the value of the categorical confirmatory factor analysis to test the IRT model assumptions, including local dependence and unidimensionality. Moreover, Langer et al. [25] have used differential item functioning analyses (DIF) to assess whether scores have equivalent meaning across healthy children and children with chronic conditions. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales. The PedsQL™ 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients.
    Full-text · Article · Mar 2012 · Health and Quality of Life Outcomes
Show more