Hill CD, Edwards MC, Thissen D, et al. Practical issues in the application of item response theory: a demonstration using items from the Pediatric Quality of Life Inventory (PedsQL) 4.0 Generic Core Scales. Med Care.45(Suppl 1):S39-S47

RTI Health Solutions, Research Triangle Park, North Carolina 27709-2194, USA.
Medical Care (Impact Factor: 3.23). 06/2007; 45(5 Suppl 1):S39-47. DOI: 10.1097/01.mlr.0000259879.05499.eb
Source: PubMed


Item response theory (IRT) is increasingly being applied to health-related quality of life instrument development and refinement. This article discusses results obtained using categorical confirmatory factor analysis (CCFA) to check IRT model assumptions and the application of IRT in item analysis and scale evaluation.
To demonstrate the value of CCFA and IRT in examining a health-related quality of life measure in children and adolescents.
This illustration uses data from 10,241 children and their parents on items from the 4 subscales of the PedsQL 4.0 Generic Core Scales. CCFA was applied to confirm domain dimensionality and identify possible locally dependent items. IRT was used to assess the strength of the relationship between the items and the constructs of interest and the information available across the latent construct.
CCFA showed generally strong support for 1-factor models for each domain; however, several items exhibited evidence of local dependence. IRT revealed that the items generally exhibit favorable characteristics and are related to the same construct within a given domain. We discuss the lessons that can be learned by comparing alternate forms of the same scale, and we assess the potential impact of local dependence on the item parameter estimates.
This article describes CCFA methods for checking IRT model assumptions and provides suggestions for using these methods in practice. It offers insight into ways information gained through IRT can be applied to evaluate items and aid in scale construction.

10 Reads
  • Source
    • "In addition, IRT methodology will be useful to industry in the design of psychometric tests. IRT has been used to analyse clinical measures in several different fields: schizophrenia [11], depression [12], attachment [13], social inhibition [14] and quality of life [15]. IRT has also been used to examine ADL and Instrumental Activities of Daily Living (IADL) scales [16,17]. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Performance on psychometric tests is key to diagnosis and monitoring treatment of dementia. Results are often reported as a total score, but there is additional information in individual items of tests which vary in their difficulty and discriminatory value. Item difficulty refers to an ability level at which the probability of responding correctly is 50%. Discrimination is an index of how well an item can differentiate between patients of varying levels of severity. Item response theory (IRT) analysis can use this information to examine and refine measures of cognitive functioning. This systematic review aimed to identify all published literature which had applied IRT to instruments assessing global cognitive function in people with dementia. A systematic review was carried out across Medline, Embase, PsychInfo and CINHAL articles. Search terms relating to IRT and dementia were combined to find all IRT analyses of global functioning scales of dementia. Of 384 articles identified four studies met inclusion criteria including a total of 2,920 people with dementia from six centers in two countries. These studies used three cognitive tests (MMSE, ADAS-Cog, BIMCT) and three IRT methods (Item Characteristic Curve analysis, Samejima's graded response model, the 2-Parameter Model). Memory items were most difficult. Naming the date in the MMSE and memory items, specifically word recall, of the ADAS-cog were most discriminatory. Four published studies were identified which used IRT on global cognitive tests in people with dementia. This technique increased the interpretative power of the cognitive scales, and could be used to provide clinicians with key items from a larger test battery which would have high predictive value. There is need for further studies using IRT in a wider range of tests involving people with dementia of different etiology and severity.
    BMC Psychiatry 02/2014; 14(1):47. DOI:10.1186/1471-244X-14-47 · 2.21 Impact Factor
  • Source
    • "Kook and Varni [23] have provided a comprehensive review of the use of IRT and CTT in the validation of the Korean version of the PedsQLTM 4.0. Hill et al. [24] have demonstrated the value of the categorical confirmatory factor analysis to test the IRT model assumptions, including local dependence and unidimensionality. Moreover, Langer et al. [25] have used differential item functioning analyses (DIF) to assess whether scores have equivalent meaning across healthy children and children with chronic conditions. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Item response theory (IRT) is extensively used to develop adaptive instruments of health-related quality of life (HRQoL). However, each IRT model has its own function to estimate item and category parameters, and hence different results may be found using the same response categories with different IRT models. The present study used the Rasch rating scale model (RSM) to examine and reassess the psychometric properties of the Persian version of the PedsQL™ 4.0 Generic Core Scales. The PedsQL™ 4.0 Generic Core Scales was completed by 938 Iranian school children and their parents. Convergent, discriminant and construct validity of the instrument were assessed by classical test theory (CTT). The RSM was applied to investigate person and item reliability, item statistics and ordering of response categories. The CTT method showed that the scaling success rate for convergent and discriminant validity were 100% in all domains with the exception of physical health in the child self-report. Moreover, confirmatory factor analysis supported a four-factor model similar to its original version. The RSM showed that 22 out of 23 items had acceptable infit and outfit statistics (<1.4, >0.6), person reliabilities were low, item reliabilities were high, and item difficulty ranged from -1.01 to 0.71 and -0.68 to 0.43 for child self-report and parent proxy-report, respectively. Also the RSM showed that successive response categories for all items were not located in the expected order. This study revealed that, in all domains, the five response categories did not perform adequately. It is not known whether this problem is a function of the meaning of the response choices in the Persian language or an artifact of a mostly healthy population that did not use the full range of the response categories. The response categories should be evaluated in further validation studies, especially in large samples of chronically ill patients.
    Health and Quality of Life Outcomes 03/2012; 10(1):27. DOI:10.1186/1477-7525-10-27 · 2.12 Impact Factor
  • Source
    • "Our finding that an IRT-based CAT can result in accurate assessments with far less items than tests based on traditional psychometrics is fully comparable to findings in other studies, applying IRT CAT techniques in the fields of intelligence and school achievement assessment, [16,17] and in the field of Quality of Life [18,22,47]. The first studies on the application of IRT models in the field of the identification of behavioural and emotional problems in paediatric care have now been published, [19-22] and these studies came to similar conclusions. Hill et al. [22] present a detailed analysis to assess the suitability of items from the Pediatric Quality of life Inventory for a CAT on distress but do not provide data on criterion validity. "
    [Show abstract] [Hide abstract]
    ABSTRACT: Questionnaires used by health services to identify children with psychosocial problems are often rather short. The psychometric properties of such short questionnaires are mostly less than needed for an accurate distinction between children with and without problems. We aimed to assess whether a short Computerized Adaptive Test (CAT) can overcome the weaknesses of short written questionnaires when identifying children with psychosocial problems. We used a Dutch national data set obtained from parents of children invited for a routine health examination by Preventive Child Healthcare with 205 items on behavioral and emotional problems (n = 2,041, response 84%). In a random subsample we determined which items met the requirements of an Item Response Theory (IRT) model to a sufficient degree. Using those items, item parameters necessary for a CAT were calculated and a cut-off point was defined. In the remaining subsample we determined the validity and efficiency of a Computerized Adaptive Test using simulation techniques, with current treatment status and a clinical score on the Total Problem Scale (TPS) of the Child Behavior Checklist as criteria. Out of 205 items available 190 sufficiently met the criteria of the underlying IRT model. For 90% of the children a score above or below cut-off point could be determined with 95% accuracy. The mean number of items needed to achieve this was 12. Sensitivity and specificity with the TPS as a criterion were 0.89 and 0.91, respectively. An IRT-based CAT is a very promising option for the identification of psychosocial problems in children, as it can lead to an efficient, yet high-quality identification. The results of our simulation study need to be replicated in a real-life administration of this CAT.
    BMC Medical Research Methodology 08/2011; 11(1):111. DOI:10.1186/1471-2288-11-111 · 2.27 Impact Factor
Show more

Similar Publications

Preview (2 Sources)

10 Reads
Available from